Concept

Why Consensus?

Why Consensus?

Consensus is the problem of getting a group of independent machines to agree on a single value — and to keep agreeing on it — even when some of them crash, restart, or are temporarily unreachable. In practice we almost never agree on just one value; we agree on an ordered sequence of values, which we call a replicated log.

Why does this matter? Almost every stateful distributed system reduces to consensus underneath:

  • Leader election – exactly one node must believe it is the primary at a time.
  • Replicated state machines – every replica applies the same commands in the same order, so they end up in identical states. This is the famous state machine replication result: agree on the log, and the rest follows deterministically.
  • Configuration & metadata – cluster membership, shard assignments, distributed locks, and feature flags must be consistent everywhere.

A correct consensus algorithm provides three guarantees:

  • Agreement / Safety – no two nodes ever decide on different values for the same log slot. This must hold always, even during partitions.
  • Validity – a decided value was actually proposed by some node (no values invented out of thin air).
  • Liveness / Termination – when the network is healthy enough and a majority is reachable, the system eventually decides. The FLP impossibility result says you cannot guarantee both safety and liveness in a fully asynchronous network with even one crash, so real algorithms never sacrifice safety and only make progress when conditions allow.

The single most important idea, repeated below in many forms: progress requires a majority (a quorum). A minority can never decide anything on its own — that is precisely what stops split-brain.