Concept
Why Consensus?
Why Consensus?
Consensus is the problem of getting a group of independent machines to agree on a single value — and to keep agreeing on it — even when some of them crash, restart, or are temporarily unreachable. In practice we almost never agree on just one value; we agree on an ordered sequence of values, which we call a replicated log.
Why does this matter? Almost every stateful distributed system reduces to consensus underneath:
- Leader election – exactly one node must believe it is the primary at a time.
- Replicated state machines – every replica applies the same commands in the same order, so they end up in identical states. This is the famous
state machine replicationresult: agree on the log, and the rest follows deterministically. - Configuration & metadata – cluster membership, shard assignments, distributed locks, and feature flags must be consistent everywhere.
A correct consensus algorithm provides three guarantees:
- Agreement / Safety – no two nodes ever decide on different values for the same log slot. This must hold always, even during partitions.
- Validity – a decided value was actually proposed by some node (no values invented out of thin air).
- Liveness / Termination – when the network is healthy enough and a majority is reachable, the system eventually decides. The FLP impossibility result says you cannot guarantee both safety and liveness in a fully asynchronous network with even one crash, so real algorithms never sacrifice safety and only make progress when conditions allow.
The single most important idea, repeated below in many forms: progress requires a majority (a quorum). A minority can never decide anything on its own — that is precisely what stops split-brain.