Paxos: Lamport’s Beautiful Nightmare
Let us begin with the protocol that launched a thousand implementations, none of which agree with each other. Paxos is the foundational consensus algorithm in distributed systems, the way quicksort is foundational in algorithms courses — except that nobody has ever been confused about how quicksort works after reading the original paper.
Leslie Lamport first described Paxos in 1989 in “The Part-Time Parliament,” a paper written as an archaeological report about a fictional Greek island where legislators needed to agree on laws despite irregular attendance. The paper was rejected. It was submitted again. It was rejected again. It was finally published in 1998, nearly a decade later. Lamport has publicly stated that the reviewers simply didn’t get the joke. The reviewers have, diplomatically, suggested that perhaps a foundational result in distributed computing deserved a presentation style that didn’t require readers to first decode an extended metaphor about olive oil merchants.
In 2001, Lamport published “Paxos Made Simple,” a paper whose opening line — “The Paxos algorithm, when presented in plain English, is very simple” — remains the most audacious claim in the history of computer science.
Let us see if he was right.
The Problem: Single-Decree Consensus
Before we get to the protocol, let us be precise about what we are trying to solve. Single-decree Paxos solves the following problem:
- A collection of processes may propose values.
- A single value must be chosen.
- Once a value is chosen, processes should be able to learn the chosen value.
And it must do so despite:
- Processes may crash and restart.
- Messages may be delayed, duplicated, reordered, or lost.
- Messages are not corrupted (we assume non-Byzantine faults).
This is the consensus problem from Chapter 3, instantiated for crash-fault environments with asynchronous networks. FLP tells us we cannot solve this deterministically while guaranteeing liveness, so Paxos makes the pragmatic choice: safety is always guaranteed, liveness is guaranteed only when the system is “sufficiently synchronous” (we’ll define what that means later, or rather, we’ll wave our hands about it the same way the paper does).
The Three Roles
Paxos defines three roles:
Proposers — These are the processes that propose values. They initiate the protocol by suggesting a value they’d like the system to agree on. In practice, proposers are typically servers that have received a client request.
Acceptors — These are the processes that vote on proposals. They form the “memory” of the system. A value is chosen when a majority (quorum) of acceptors have accepted it. You need at least 2f+1 acceptors to tolerate f failures.
Learners — These are the processes that need to find out what value was chosen. In many implementations, learners are the same processes as acceptors, but conceptually they are distinct.
A single physical process may play all three roles simultaneously — and in most real deployments, it does. But separating the roles is essential for understanding the protocol. This is one of those things that seems like pedantic academic nonsense until you try to implement it, at which point you realize the role separation is doing real work in the correctness argument.
Proposal Numbers: The Unsung Hero
Before we dive into the phases, we need to talk about proposal numbers (also called ballot numbers). Every proposal in Paxos carries a unique number, and these numbers must be:
- Totally ordered — Any two proposal numbers can be compared.
- Unique — No two proposers ever use the same proposal number.
- Increasing — A proposer always uses a higher number than any it has seen before.
The standard trick is to use a pair (sequence_number, proposer_id) where the proposer_id breaks ties. So proposer 1 might use numbers 1.1, 2.1, 3.1 and proposer 2 might use 1.2, 2.2, 3.2, ordered lexicographically by the first component, then the second.
This seems trivial. It is not. In a real implementation, you need to ensure that proposal numbers survive crashes (otherwise a restarting proposer might reuse old numbers), that they increase monotonically (otherwise the proposer gets permanently ignored), and that the gap between a proposer’s numbers isn’t so large that it burns through the number space (which matters in long-running systems, though you’d need to be impressively careless).
function generate_proposal_number(proposer_id, last_seen_number):
// Ensure we always go higher than anything we've seen
new_sequence = last_seen_number.sequence + 1
proposal = (new_sequence, proposer_id)
// CRITICAL: persist this before using it
persist_to_disk(proposal)
return proposal
Phase 1: Prepare and Promise
Phase 1 is where a proposer stakes its claim. Think of it as the proposer walking into a room and asking, “Is anyone already committed to something? And will you promise to listen to me?”
The Proposer’s Side (Prepare)
The proposer selects a proposal number n that is higher than any it has previously used, and sends a Prepare(n) message to a majority of acceptors.
Note: the proposer does NOT include a value in the Prepare message. This is crucial. The proposer is not yet proposing a value — it is asking for permission to propose and learning what, if anything, the acceptors have already accepted.
function proposer_start(value):
n = generate_proposal_number(my_id, highest_seen)
highest_seen = n
// Send Prepare to a majority of acceptors
for acceptor in choose_majority(all_acceptors):
send(acceptor, Prepare(n))
// Wait for responses (with timeout)
promises = wait_for_majority_responses(timeout)
if |promises| < majority_size:
// Failed to get a majority — retry with higher number
return proposer_start(value)
// Phase 1 complete — move to Phase 2
return proposer_phase2(n, value, promises)
The Acceptor’s Side (Promise)
When an acceptor receives a Prepare(n) message, it does one of two things:
If n is greater than any proposal number it has already responded to: The acceptor promises not to accept any proposal numbered less than n, and responds with Promise(n, accepted_proposal, accepted_value) — where accepted_proposal and accepted_value are the highest-numbered proposal it has already accepted (if any).
If n is less than or equal to a proposal number it has already responded to: The acceptor ignores the request (or sends a Nack to be polite — the paper doesn’t require this, but implementations always do it because otherwise the proposer just sits there waiting for a response that will never come).
// Acceptor state (must be persistent across crashes!)
state:
highest_promised = 0 // highest proposal number we've promised
accepted_proposal = null // highest proposal number we've accepted
accepted_value = null // the value we accepted
function acceptor_on_prepare(n, from_proposer):
if n > highest_promised:
highest_promised = n
// CRITICAL: persist state before responding
persist_to_disk(highest_promised, accepted_proposal, accepted_value)
send(from_proposer, Promise(n, accepted_proposal, accepted_value))
else:
// Optional but strongly recommended
send(from_proposer, Nack(n, highest_promised))
That persist_to_disk call is doing enormous work. If the acceptor crashes after sending a Promise but before persisting, and then restarts and accepts a lower-numbered proposal, the safety of the entire protocol is violated. This is the kind of thing that “Paxos Made Simple” mentions in one sentence and that takes a month to get right in production.
Phase 2: Accept and Accepted
Phase 2 is where the actual value gets proposed and (hopefully) chosen.
The Proposer’s Side (Accept)
The proposer examines the promises it received. If any acceptor reported having already accepted a value, the proposer must propose that value — specifically, the value from the highest-numbered accepted proposal among all promises. If no acceptor has accepted anything, the proposer is free to propose its own value.
This is the key insight of Paxos, the thing that makes the whole protocol work: a proposer does not get to freely choose its value. It must defer to any value that might already be chosen. This constraint is what preserves safety.
function proposer_phase2(n, my_value, promises):
// Find the highest-numbered already-accepted value
highest_accepted = null
for promise in promises:
if promise.accepted_proposal != null:
if highest_accepted == null or
promise.accepted_proposal > highest_accepted.proposal:
highest_accepted = {
proposal: promise.accepted_proposal,
value: promise.accepted_value
}
// Choose value: defer to any already-accepted value
if highest_accepted != null:
value = highest_accepted.value
// Our original value is lost. This is fine. This is consensus.
else:
value = my_value
// Send Accept request to a majority of acceptors
for acceptor in choose_majority(all_acceptors):
send(acceptor, Accept(n, value))
// Wait for Accepted responses
accepted_responses = wait_for_majority_responses(timeout)
if |accepted_responses| >= majority_size:
// Value is chosen!
notify_learners(value)
return value
else:
// Failed — some acceptor must have promised a higher number
return proposer_start(my_value) // Retry from scratch
The Acceptor’s Side (Accepted)
When an acceptor receives an Accept(n, value) message:
If n >= highest_promised: The acceptor accepts the proposal — it updates its state to record (n, value) as accepted, and responds with Accepted(n, value).
If n < highest_promised: The acceptor has promised not to accept proposals below highest_promised, so it ignores the request (or sends a Nack).
function acceptor_on_accept(n, value, from_proposer):
if n >= highest_promised:
highest_promised = n
accepted_proposal = n
accepted_value = value
// CRITICAL: persist before responding
persist_to_disk(highest_promised, accepted_proposal, accepted_value)
send(from_proposer, Accepted(n, value))
// Also notify learners (discussed below)
for learner in all_learners:
send(learner, Accepted(n, value))
else:
send(from_proposer, Nack(n, highest_promised))
Learning a Chosen Value
A value is chosen when a majority of acceptors have accepted the same proposal number. Learners need to find out about this. There are a few strategies:
-
Each acceptor sends Accepted to all learners. This is O(acceptors * learners) messages but is the fastest — learners find out as soon as possible.
-
Each acceptor sends Accepted to a distinguished learner, who tells the others. This reduces messages but adds latency and a single point of failure.
-
Each acceptor sends Accepted to a set of distinguished learners. A compromise.
In practice, the proposer typically acts as the distinguished learner: it counts Accepted responses, and once it has a majority, it knows the value is chosen and can notify clients.
function learner_on_accepted(n, value, from_acceptor):
// Track which acceptors have accepted which proposals
if n not in acceptance_counts:
acceptance_counts[n] = {value: value, acceptors: {}}
acceptance_counts[n].acceptors.add(from_acceptor)
if |acceptance_counts[n].acceptors| >= majority_size:
// Value is chosen!
chosen_value = acceptance_counts[n].value
deliver_to_application(chosen_value)
The Happy Path: A Walk-Through
Let’s trace through a successful run with three acceptors (A1, A2, A3) and one proposer (P1) proposing value “X”.
Proposer P1 Acceptors A1, A2, A3
| | | |
|--- Prepare(1) ----------->| | |
|--- Prepare(1) ---------------->| |
|--- Prepare(1) --------------------->|
| | | |
|<-- Promise(1, null, null) -| | |
|<-- Promise(1, null, null) ------| |
|<-- Promise(1, null, null) -----------|
| | | |
| (No prior accepted values, so P1 proposes "X")
| | | |
|--- Accept(1, "X") ------->| | |
|--- Accept(1, "X") ------------>| |
|--- Accept(1, "X") ---------------->|
| | | |
|<-- Accepted(1, "X") ------| | |
|<-- Accepted(1, "X") -----------| |
|<-- Accepted(1, "X") ----------------|
| | | |
| (Majority accepted: "X" is chosen)
Two round trips. Four message types. Beautifully simple when nothing goes wrong.
Things always go wrong.
Failure Case 1: Competing Proposers
This is where Paxos earns its reputation. Two proposers, P1 and P2, both trying to get their value chosen.
P1 (value "X") P2 (value "Y") A1 A2 A3
| | | | |
|--- Prepare(1) ------------------> | |
|--- Prepare(1) -----------------------------> |
| | | | |
| |--- Prepare(2) --> | |
| |--- Prepare(2) ------------> |
| | | | |
|<-- Promise(1, null) -------------| | |
|<-- Promise(1, null) ----------------------| |
| | | | |
| |<-- Promise(2, null) ---| |
| |<-- Promise(2, null) ------------|
| | | | |
| (P1 got majority promises) | | |
| (P2 got majority promises) |
| | | | |
|--- Accept(1, "X") ----------->| | |
|--- Accept(1, "X") -------------------->| |
| | | | |
| A2 promised proposal 2 to P2, so it REJECTS Accept(1, "X")
| | | | |
|<-- Accepted(1, "X") ----------| | |
|<-- Nack(1, 2) ------------------------------| |
| | | | |
| (P1 only got 1 Accepted, not a majority — fails) |
P1’s Accept was rejected by A2 because A2 had already promised proposal number 2 to P2. P1 did not get a majority of Accepted responses, so it must retry with a higher proposal number. Meanwhile, P2 can proceed with its Phase 2.
But here’s the nasty part: when P1 retries with proposal number 3, it might preempt P2’s Accept phase. Then P2 retries with 4, preempting P1. This is the dueling proposers problem, also known as livelock.
Failure Case 2: Dueling Proposers (Livelock)
P1 (value "X") P2 (value "Y") A1 A2 A3
| | | | |
| Prepare(1) succeeds | | |
| | Prepare(2) succeeds |
| Accept(1) rejected | | |
| | Accept(2) -- about to send... |
| Prepare(3) succeeds | | |
| | Accept(2) rejected | |
| | Prepare(4) succeeds |
| Accept(3) rejected | | |
| | Accept(4) -- about to send... |
| Prepare(5) succeeds | | |
...forever...
This is not a theoretical concern. This happens in real systems, especially under high load when multiple clients are submitting requests simultaneously.
The standard mitigations:
-
Randomized backoff. When a proposer is preempted, it waits a random amount of time before retrying. This is the “just add jitter” school of distributed systems design, and it works surprisingly well in practice.
-
Leader election. Elect a single distinguished proposer. If only one proposer is active, there’s no contention. This is what Multi-Paxos does (Chapter 6), and it’s what every production system does.
-
Exponential backoff. Like randomized backoff but more aggressive — double the wait time on each failure. Most implementations combine random jitter with exponential backoff.
function proposer_start_with_backoff(value, attempt = 0):
n = generate_proposal_number(my_id, highest_seen)
// ... run Phase 1 and Phase 2 ...
if failed:
// Back off before retrying
max_delay = min(BASE_DELAY * 2^attempt, MAX_DELAY)
delay = random(0, max_delay)
sleep(delay)
return proposer_start_with_backoff(value, attempt + 1)
Failure Case 3: Acceptor Crash
Suppose acceptor A3 crashes permanently. With three acceptors (2f+1 = 3, so f = 1), we can tolerate one failure. The protocol proceeds normally with A1 and A2 forming the majority.
But what if A3 crashes after sending a Promise to P1 but before P1’s Accept reaches it? No problem — P1 only needs a majority of Accepted responses. If A1 and A2 both accept, the value is chosen.
What if A3 crashes after accepting a value but before anyone else has? This is more interesting. The accepted value is “stuck” on A3’s disk. If A3 comes back, that value will be reported in future Promise messages, and the protocol ensures it can still be chosen. If A3 never comes back, and no other acceptor accepted that value, then it was never chosen (because a majority never accepted it), and the system can safely choose a different value.
This is a subtle point that trips up many implementers: an individual acceptor accepting a value does not mean the value is chosen. The value is chosen only when a majority of acceptors accept the same proposal number.
Failure Case 4: Message Loss
Paxos handles message loss by… not handling it. The protocol is correct regardless of which messages are lost. If a Prepare is lost, the proposer times out and retries. If a Promise is lost, the proposer might not get a majority and retries. If an Accept is lost, same thing. If an Accepted is lost, the learner might not find out the value is chosen, but the value IS still chosen, and a future round will discover it.
This is the beauty of Paxos: the safety argument does not depend on message delivery guarantees at all. Liveness requires that messages eventually get through (the “sufficient synchrony” assumption), but safety holds even in a fully asynchronous network with arbitrary message loss.
Why Paxos Is Correct: Intuition
The full proof is in Lamport’s papers, and I won’t reproduce it here. But the intuition is worth understanding.
The key invariant is: if a value v has been chosen (accepted by a majority) with proposal number n, then every higher-numbered proposal that is accepted by any acceptor also has value v.
Why does this hold? Because of the Phase 1 constraint:
- If value v was chosen with proposal n, then a majority of acceptors accepted (n, v).
- Any future proposer with proposal number m > n must do Phase 1 and get promises from a majority.
- Any two majorities overlap in at least one acceptor (this is the pigeonhole principle, doing heroic work).
- That overlapping acceptor has accepted (n, v) and will report it in its Promise.
- The Phase 2 rule forces the new proposer to propose v (since it must use the value from the highest-numbered accepted proposal among the promises it received).
The overlapping quorum is doing all the heavy lifting. This is why you need 2f+1 acceptors — not for availability (though that helps), but for the intersection property that makes the safety proof work.
Majority 1 (chose value v): { A1, A2, A3 } (3 out of 5)
Majority 2 (new proposer): { A3, A4, A5 } (3 out of 5)
^^
Overlap: A3 reports v
There is one subtlety here that the “simple” explanations gloss over: proposal numbers between n and m might have started Phase 1 but never completed Phase 2. The proof handles this by induction on proposal numbers, showing the invariant holds for each number in sequence. It’s not complicated, but it is fiddly, and getting the induction right is where most people’s eyes glaze over.
What the Paper Doesn’t Tell You
Now for the part that matters if you’re trying to build something. Lamport’s papers describe the protocol. They do not describe a system. The gap between the two is vast and full of tears.
1. State Machine Identity
Single-decree Paxos agrees on one value. You want to agree on a sequence of commands — a replicated log. The paper waves at this with “just run multiple instances.” Chapter 6 covers this in detail, but the short version is: the devil is in every single detail of how you index those instances, how you handle gaps, how you compact old entries, and how you bring up new replicas.
2. Disk Persistence
The protocol requires that acceptor state survives crashes. This means fsync(). Do you know how slow fsync() is? On a spinning disk, you’re looking at 5-10ms. On an SSD, maybe 0.1-0.5ms. Either way, you’re fsyncing on the critical path — before you can send a Promise or Accepted message. Batching helps. Group commit helps. But the paper doesn’t discuss any of this.
3. Network Timeouts
The paper describes no timeout mechanism. In practice, you need timeouts everywhere:
- How long does a proposer wait for Promise responses?
- How long does it wait for Accepted responses?
- When does a learner give up and ask again?
- How long before a client request is considered failed?
Getting these timeouts wrong doesn’t violate safety (Paxos is always safe!), but it destroys liveness. Set them too low and you get thrashing. Set them too high and the system is unresponsive to failures.
4. How Learners Actually Learn
The paper is remarkably vague about how learners discover chosen values. In practice, this requires either:
- Having the proposer announce the chosen value (but what if the proposer crashes right after the value is chosen but before announcing it?)
- Having each acceptor send Accepted messages to all learners (lots of messages)
- Having learners periodically query acceptors (latency)
Most implementations punt on this by having the proposer tell the client directly and relying on the next Paxos instance to implicitly confirm the previous one.
5. Read Operations
Paxos agrees on a value. It says nothing about reading that value later. If you want linearizable reads (and you do, or why are you bothering with consensus?), you need either:
- Run a full Paxos round for every read (expensive)
- Use leases (the leader promises not to accept new proposals for some time period, so reads from the leader are safe within the lease period)
- Read from any majority and take the highest-numbered accepted value (which might not be chosen yet — subtle!)
6. Client Interaction
What happens when a client submits a request? The paper doesn’t say. You need to figure out:
- Which proposer handles the request?
- What if the proposer crashes after the value is chosen but before responding to the client?
- What if the client retries and the request gets executed twice?
- How does the client find the current leader?
All of these require idempotency tokens, client tables, leader redirect mechanisms, and other infrastructure that the protocol description simply assumes exists.
The Full Acceptor: Putting It Together
Here is the complete acceptor pseudocode with all the details:
class PaxosAcceptor:
// Persistent state (MUST survive crashes)
persistent:
highest_promised: ProposalNumber = 0
accepted_proposal: ProposalNumber = null
accepted_value: Value = null
function on_prepare(n: ProposalNumber, from: ProposerId):
if n > self.highest_promised:
self.highest_promised = n
self.persist() // fsync!
reply(from, Promise {
proposal: n,
accepted_proposal: self.accepted_proposal,
accepted_value: self.accepted_value
})
else:
reply(from, Nack {
proposal: n,
highest_promised: self.highest_promised
})
function on_accept(n: ProposalNumber, value: Value, from: ProposerId):
if n >= self.highest_promised:
self.highest_promised = n
self.accepted_proposal = n
self.accepted_value = value
self.persist() // fsync!
reply(from, Accepted {
proposal: n,
value: value
})
// Notify learners
for learner in self.known_learners:
send(learner, Accepted {
proposal: n,
value: value
})
else:
reply(from, Nack {
proposal: n,
highest_promised: self.highest_promised
})
function persist():
write_to_disk(self.highest_promised,
self.accepted_proposal,
self.accepted_value)
fsync() // DO NOT SKIP THIS
function recover():
// Called on restart
(self.highest_promised,
self.accepted_proposal,
self.accepted_value) = read_from_disk()
// Resume normal operation — no special recovery needed!
The Full Proposer: Putting It Together
class PaxosProposer:
state:
my_id: ProposerId
highest_seen: SequenceNumber = 0
quorum_size: int // majority of acceptors
function propose(value: Value) -> Value:
attempt = 0
while true:
result = try_propose(value, attempt)
if result.success:
return result.chosen_value
attempt += 1
function try_propose(value: Value, attempt: int) -> Result:
// Generate proposal number
n = (self.highest_seen + 1, self.my_id)
self.highest_seen = n.sequence
persist(self.highest_seen) // Must survive crashes
// Phase 1: Prepare
promises = []
nacks = []
send_to_all_acceptors(Prepare(n))
deadline = now() + phase1_timeout
while |promises| < self.quorum_size and now() < deadline:
msg = receive(deadline - now())
if msg is Promise and msg.proposal == n:
promises.append(msg)
elif msg is Nack and msg.proposal == n:
nacks.append(msg)
// Update highest_seen from nack
self.highest_seen = max(self.highest_seen,
msg.highest_promised.sequence)
if |promises| < self.quorum_size:
backoff(attempt)
return Result(success=false)
// Phase 2: Determine value
highest_accepted = null
for promise in promises:
if promise.accepted_proposal != null:
if highest_accepted == null or
promise.accepted_proposal > highest_accepted.proposal:
highest_accepted = promise
if highest_accepted != null:
value = highest_accepted.accepted_value
// else: use our original value
// Phase 2: Accept
accepted_count = 0
send_to_all_acceptors(Accept(n, value))
deadline = now() + phase2_timeout
while accepted_count < self.quorum_size and now() < deadline:
msg = receive(deadline - now())
if msg is Accepted and msg.proposal == n:
accepted_count += 1
elif msg is Nack and msg.proposal == n:
self.highest_seen = max(self.highest_seen,
msg.highest_promised.sequence)
backoff(attempt)
return Result(success=false)
if accepted_count >= self.quorum_size:
return Result(success=true, chosen_value=value)
else:
backoff(attempt)
return Result(success=false)
function backoff(attempt: int):
max_delay = min(BASE_DELAY_MS * 2^attempt, MAX_DELAY_MS)
sleep(random(0, max_delay))
Lamport’s Writing and Its Effect on the Field
It would be incomplete to discuss Paxos without discussing how it was communicated to the world, because the communication strategy — or perhaps the miscommunication strategy — materially affected how the field developed.
“The Part-Time Parliament” is a work of dry humor wrapped around a profound result. The fictional conceit — legislators on the island of Paxos agreeing on decrees — maps onto the protocol with remarkable precision. The roles of proposers, acceptors, and learners correspond to Paxon politicians, and the protocol phases correspond to parliamentary procedures.
The problem is that most readers are not trying to appreciate literary craft when reading a distributed systems paper. They are trying to understand how to make their database not lose data. The fictional frame adds cognitive overhead that, for many readers, obscures rather than illuminates.
“Paxos Made Simple,” for all its claims of simplicity, is dense. It presents the protocol in about five pages, which is admirably concise but leaves no room for the examples and failure-case analysis that most engineers need to build intuition. The paper’s brevity is itself a kind of opacity.
The result was that for years, Paxos was viewed as a mysterious, almost mystical algorithm — understood by a priesthood of distributed systems researchers and feared by everyone else. This reputation was not entirely undeserved (the protocol IS subtle), but it was amplified enormously by the way it was presented. The success of Raft (Chapter 8), which solves essentially the same problem but was designed explicitly for understandability, is Exhibit A in the case that Paxos’s difficulty was at least partly a communication failure.
Common Misconceptions
“Paxos requires three rounds of messages.” No. The happy path is two rounds: Prepare/Promise and Accept/Accepted. The learner notification is a third round but is not part of the consensus protocol per se.
“Paxos requires all acceptors to respond.” No. It requires a majority (quorum). That’s the whole point — it tolerates f failures out of 2f+1 acceptors.
“A proposer can choose any value.” Only if no acceptor reports a previously accepted value. Otherwise the proposer is constrained.
“Paxos guarantees liveness.” No. FLP says it can’t, and the dueling proposers scenario is a concrete demonstration. It guarantees safety always and liveness under reasonable timing assumptions.
“Once an acceptor accepts a value, that value is chosen.” No. The value is chosen when a majority accepts the same proposal number. An individual accept means nothing by itself.
“Paxos is just two-phase commit.” Absolutely not. 2PC requires all participants to vote yes. Paxos requires only a majority. 2PC blocks if the coordinator crashes. Paxos makes progress as long as a majority is reachable. They solve fundamentally different problems, though the phase structure looks superficially similar.
Performance Characteristics
In the steady state (single proposer, no failures):
- Message complexity: 4n messages for n acceptors (n Prepares + n Promises + n Accepts + n Accepteds). With a distinguished learner, it’s 4n. Without, add n*m where m is the number of learners.
- Round trips: 2 (Prepare/Promise, then Accept/Accepted).
- Latency: Dominated by the two sequential fsyncs on the acceptor side (one for Promise, one for Accepted) and network round trip time.
- Throughput: One value chosen per two round trips. This is terrible if you’re choosing one value at a time. Multi-Paxos (Chapter 6) fixes this.
The Bridge to Multi-Paxos
Single-decree Paxos is a building block, not a system. To build a replicated state machine, you need to agree on a sequence of commands, not just one. The naive approach — run a separate instance of Paxos for each command — works but is absurdly expensive: two round trips per command, each requiring fsyncs.
Multi-Paxos optimizes this by electing a stable leader who can skip Phase 1 for most commands, reducing the steady-state cost to a single round trip. But as we’ll see in the next chapter, the distance between “just run multiple Paxos instances with a stable leader” and a working system is approximately the distance between Earth and Alpha Centauri.
Single-decree Paxos is beautiful. It is minimal, elegant, and provably correct. It is also, by itself, almost entirely useless for building real systems. That is the agony of Paxos: the core insight is perfect, and everything around it is pain.