Reliability

Using TLA+ to Find a Byzantine Fault in Our Distributed Consensus Protocol

March 25, 202616 min read

Formal methods caught a subtle liveness violation that 18 months of chaos engineering and 10,000 hours of simulation had missed.

Overview

This article is part of Softmotion's research blog — technical writing from the engineers building datacenter infrastructure, AI systems, voice servers, and distributed systems at scale.

Full article coming soon. We publish new technical deep-dives weekly.