Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

The Byzantine Generals Problem

In 1982, Leslie Lamport, Robert Shostak, and Marshall Pease published “The Byzantine Generals Problem,” a paper that gave distributed computing its most evocative metaphor and its most expensive failure model. The paper’s contribution is not the metaphor — it is the proof that tolerating arbitrary (Byzantine) failures requires fundamentally more resources than tolerating crash failures, and the precise characterization of exactly how much more.

The Byzantine failure model is named not because it was invented in Byzantium, but because Lamport wanted a metaphor involving treacherous generals and chose the Byzantine Empire for historical color. The name stuck, and now every distributed systems paper must explain that “Byzantine” means “arbitrarily faulty” and not “unnecessarily complicated” — though in practice, Byzantine fault tolerance is both.

The Problem

An army surrounds a city. The army is divided into divisions, each commanded by a general. The generals can communicate only by messenger. They must agree on a common plan of action: attack or retreat. Some generals may be traitors. The traitors can send different messages to different generals, lie about what messages they received, and generally do anything to prevent the loyal generals from reaching agreement.

The requirements:

  1. All loyal generals decide on the same plan. (Agreement)
  2. If all loyal generals propose the same plan, that plan is the decision. (Validity)

Notice that we do not care what the traitors decide. We only care that the loyal generals agree with each other. Also notice that there is no requirement that the loyal generals detect who the traitors are — only that they reach agreement despite the traitors’ interference.

Translated to computer science: we have N processes, up to f of which may exhibit arbitrary (Byzantine) failures. The correct processes must agree on a value, despite the Byzantine processes sending contradictory, fabricated, or strategically timed messages.

Why 3f+1: The Impossibility with 3 Generals

The most important result in the paper is negative: with only 3 generals, consensus is impossible if even 1 is a traitor (without digital signatures). Let us see why.

Scenario Setup

Three generals: A (the commander, who proposes a value), B, and C. One of them is a traitor. We consider all cases.

Case 1: The Commander (A) Is the Traitor

        A (TRAITOR)
       / \
      /   \
  "attack" "retreat"
    /         \
   B           C

B receives: "attack" from A
C receives: "retreat" from A

Phase 2: B and C exchange what they heard from A.
B tells C: "A said attack"
C tells B: "A said retreat"

B sees: A said "attack", C says A said "retreat"
    -> B does not know who is lying: A or C
C sees: A said "retreat", B says A said "attack"
    -> C does not know who is lying: A or B

B and C cannot determine whether A sent inconsistent messages (A is the traitor) or whether the other lieutenant is lying about what A said (B or C is the traitor). They have symmetric, contradictory information.

Case 2: Lieutenant C Is the Traitor

        A (honest)
       / \
      /   \
  "attack" "attack"
    /         \
   B           C (TRAITOR)

B receives: "attack" from A
C receives: "attack" from A (but C is a traitor)

Phase 2: B and C exchange what they heard from A.
B tells C: "A said attack" (honest)
C tells B: "A said retreat" (LIE)

B sees: A said "attack", C says A said "retreat"
    -> This looks IDENTICAL to Case 1 from B's perspective

From B’s perspective, Cases 1 and 2 are indistinguishable. B received “attack” from A and heard C claim “retreat.” B cannot determine whether A is a traitorous commander who sent different messages, or C is a traitorous lieutenant who is lying.

Since B cannot distinguish the cases, B must take the same action in both. But in Case 2, the correct decision is “attack” (that is what the honest commander ordered), while in Case 1, the decision could go either way (the commander is the traitor, so there is no “correct” order, but B and C must still agree).

If B decides “attack” in both cases (following the commander), then in Case 1, C (who received “retreat” from the commander) would follow the commander and decide “retreat.” B and C disagree. Agreement violated.

If B decides based on majority vote (2 of the 3 received values), the traitor can always craft messages to prevent agreement by sending the value that creates a tie.

This is not a matter of finding a cleverer protocol. No protocol can solve the problem with 3 generals and 1 traitor when messages are unauthenticated. The proof generalizes: with N generals and f traitors, the problem is unsolvable unless N > 3f. You need 3f+1 generals to tolerate f traitors.

The Generalized Impossibility Argument

The 3-generals impossibility generalizes through a simulation argument. Suppose you have a protocol P that works with N = 3f generals tolerating f faults. Lamport, Shostak, and Pease show that you can use P to construct a protocol for 3 generals tolerating 1 fault (each of the 3 generals simulates f of the 3f generals). But we just showed 3-generals-1-fault is impossible. Contradiction. Therefore P cannot exist. You need N >= 3f + 1.

The intuition is this: with 3f generals and f traitors, the 2f honest generals need to “outvote” the f traitors. But the traitors can pretend to be on either side of a disagreement, effectively casting f votes for each side. The 2f honest generals split into two groups — those who heard “attack” and those who heard “retreat” — and each group might be as small as f. The traitors can always make the groups equal, preventing a majority.

With 3f+1 generals and f traitors, the 2f+1 honest generals form a strict majority. Even if the traitors conspire optimally, they cannot make two conflicting values both appear to have majority support among the honest generals.

The Oral Messages Algorithm: OM(m)

Lamport, Shostak, and Pease provided a constructive algorithm for the case N >= 3f+1. The Oral Messages algorithm OM(m) solves Byzantine agreement for m traitors using N >= 3m+1 generals.

“Oral messages” means messages are unauthenticated: the receiver knows who sent a message, but cannot prove to a third party what was sent. A traitor can claim to have received any message from any general.

OM(0): Base Case

When m = 0 (no traitors), the algorithm is trivial:

procedure OM(0, commander, value):
    // Commander sends value to all lieutenants
    commander sends value to all N-1 lieutenants
    each lieutenant decides on the value received

No traitors, no problems. If a lieutenant does not receive a value (should not happen with m=0), it defaults to a predetermined value, say RETREAT.

OM(1): One Traitor

This is the interesting case. We need N >= 4 generals to tolerate 1 traitor.

procedure OM(1, commander, value):
    // Step 1: Commander sends value to all lieutenants
    commander sends v_i to each lieutenant i
    // (If commander is honest, all v_i are the same)
    // (If commander is traitor, v_i may differ)

    // Step 2: Each lieutenant relays what it received
    for each lieutenant i:
        lieutenant i sends "commander told me v_i"
            to all other lieutenants
        // (If lieutenant i is honest, it sends the true v_i)
        // (If lieutenant i is a traitor, it can lie)

    // Step 3: Each lieutenant decides by majority vote
    for each lieutenant i:
        // i has its own value v_i from the commander
        // i has received reported values from all other lieutenants
        // Collect all values (own + reported)
        values = [v_i] + [reported values from each other lieutenant]
        decision = majority(values)  // or default if no majority

Let us trace through with 4 generals (A = commander, B, C, D = lieutenants) and 1 traitor.

Case: Commander A is the traitor.

A sends "attack" to B, "retreat" to C, "attack" to D

Step 2 (relaying):
B tells C: "A said attack"     B tells D: "A said attack"
C tells B: "A said retreat"    C tells D: "A said retreat"
D tells B: "A said attack"     D tells C: "A said attack"

Step 3 (voting):
B sees: attack(self), retreat(from C), attack(from D) -> majority: ATTACK
C sees: retreat(self), attack(from B), attack(from D) -> majority: ATTACK
D sees: attack(self), retreat(from C), attack(from B) -> majority: ATTACK

All loyal lieutenants decide ATTACK. Agreement holds!

Even though the commander sent different values, the relay step allows the honest lieutenants to reconstruct a majority. The key: there are 3 honest lieutenants and only 1 traitor, so the honest majority’s relay messages dominate.

Case: Lieutenant D is the traitor.

A sends "attack" to B, C, D (honest commander, all the same)

Step 2 (relaying):
B tells C: "A said attack"    B tells D: "A said attack"
C tells B: "A said attack"    C tells D: "A said attack"
D tells B: "A said retreat"   D tells C: "A said retreat"  (LIES!)

Step 3 (voting):
B sees: attack(self), attack(from C), retreat(from D) -> majority: ATTACK
C sees: attack(self), attack(from B), retreat(from D) -> majority: ATTACK
D: traitor, we don't care what it decides

Both honest lieutenants decide ATTACK. Agreement holds!

D’s lies are outvoted by the honest majority. This is why 3f+1 is needed: the honest lieutenants’ voices must outnumber the traitors’ lies.

OM(m): The General Case

For m traitors, the algorithm is recursive. Each level of recursion reduces the problem by one traitor, at the cost of N-1 sub-invocations.

procedure OM(m, commander, value):
    if m == 0:
        // Base case: trust the commander
        commander sends value v to all lieutenants
        each lieutenant uses v (or DEFAULT if not received)
        return

    // Step 1: Commander sends value to all lieutenants
    commander sends v_i to each lieutenant i

    // Step 2: Each lieutenant acts as commander for OM(m-1)
    for each lieutenant i:
        lieutenant i runs OM(m-1) as commander,
            sending its received value v_i
            to all other lieutenants (excluding original commander)

    // Step 3: Each lieutenant decides by majority
    for each lieutenant i:
        // For each other lieutenant j:
        //   Let w_j = the value that i decided from j's OM(m-1) sub-call
        // i's own value from commander = v_i
        values = {v_j for each lieutenant j != i} union {v_i}
        decision = majority(values)

The message complexity is exponential: O(N^(m+1)). For m=1 and N=4, this is 4^2 = 16 messages. For m=2 and N=7, this is 7^3 = 343 messages. This is not a typo. The algorithm is theoretically important but impractical for more than a handful of traitors.

The Cost of OM(m)

m (traitors)N (minimum nodes)MessagesRounds
14O(N^2)2
27O(N^3)3
310O(N^4)4
f3f+1O(N^(f+1))f+1

Nobody uses OM(m) in production. It exists to prove that the problem is solvable with 3f+1 nodes. Practical Byzantine protocols (PBFT, HotStuff, Tendermint) achieve polynomial message complexity.

Signed Messages: Changing the Rules

The impossibility result (N >= 3f+1 for unauthenticated messages) relies on a critical assumption: messages are “oral,” meaning a traitor can claim to have received any message from anyone, and no one can prove otherwise.

Digital signatures change this. If every message is signed with the sender’s private key, a traitor cannot forge messages from honest generals. This changes the bounds dramatically.

With signed messages, Byzantine consensus is solvable with N >= f+2 nodes (rather than 3f+1). The Signed Messages algorithm SM(m) from the same paper achieves this.

procedure SM(m, commander, value):
    // Commander signs and sends value to all lieutenants
    commander sends (value, sign(commander, value)) to all

    // Each lieutenant that receives a valid message:
    //   - Adds it to its set of known values
    //   - If it hasn't relayed this message yet, signs and forwards it

    procedure ON_RECEIVE(msg, signatures):
        if all signatures are valid:
            add msg.value to known_values
            if |signatures| <= m:
                // Still need more relay steps
                add my_signature to signatures
                forward (msg.value, signatures) to all who haven't signed

    // After m+1 rounds:
    if |known_values| == 1:
        decide the single value
    else:
        decide DEFAULT  // commander sent different values (is a traitor)

Why does this work with fewer nodes? Because signatures prevent the core attack: a traitor cannot impersonate an honest general. When B tells C “A said attack,” B’s signature proves the message is from B, and A’s signature (forwarded by B) proves A actually sent “attack.” C can verify the chain of signatures without trusting B.

The tradeoff: every message requires cryptographic operations. Signing and verifying digital signatures has non-trivial CPU cost. In a high-throughput system, this can be a bottleneck.

Practical Implications of Signatures

In the era of Lamport, Shostak, and Pease, digital signatures were expensive and their security assumptions were debatable. Today, Ed25519 signature verification takes about 70 microseconds on commodity hardware, and we have well-understood signature schemes.

Most modern Byzantine protocols use signatures liberally:

ProtocolSignature UsageBenefit
PBFTSignatures on all protocol messagesPrevents impersonation, enables view change proofs
HotStuffThreshold signatures for quorum certificatesReduces message complexity to O(N)
TendermintSignatures on votesEnables evidence-based slashing

The SM(m) algorithm’s f+2 bound is theoretically optimal for signed messages, but practical protocols still use 3f+1 nodes. Why? Because the SM(m) algorithm has exponential message complexity (like OM(m)), and the additional nodes in PBFT-style protocols enable polynomial-complexity algorithms. The extra nodes buy you efficiency, not just fault tolerance.

Crash vs. Byzantine: A Practical Comparison

The distinction between crash and Byzantine failures is not just academic. It has concrete implications for system design.

What Crash Failures Look Like

A crashed node stops responding. It does not send any messages. It does not corrupt data. It either participates correctly or not at all.

// Crash failure behavior
procedure CRASH_NODE():
    state = CORRECT
    while state == CORRECT:
        msg = receive()
        response = PROTOCOL_CORRECT_BEHAVIOR(msg)
        send(response)
        if random_hardware_event():
            state = DEAD  // stops executing
    // Silent forever after

Detection: eventually, other nodes notice the silence (missing heartbeats). The detection may be slow, but there is no ambiguity about the type of failure once it is detected.

What Byzantine Failures Look Like

A Byzantine node can do anything. In practice, “anything” usually manifests as one of these patterns:

Software bugs causing inconsistent behavior:

// A bug that looks Byzantine
procedure BUGGY_SERIALIZE(value):
    if platform == "x86":
        return serialize_little_endian(value)
    else:
        return serialize_big_endian(value)
    // Different nodes running different architectures
    // send different byte representations of the same value
    // Recipients interpret them as different values

Hardware corruption:

// A bit flip in memory or on the wire
procedure CORRUPTED_RESPONSE(msg):
    response = correct_response(msg)
    if cosmic_ray_hits_memory():
        response.value = response.value XOR random_bit
    return response
    // The node THINKS it sent the right answer

Equivocation (the classic Byzantine behavior):

// A compromised node deliberately sends different values
procedure EQUIVOCATING_NODE(msg):
    if msg.sender == "Node A":
        send (VOTE, "X") to Node A
    else if msg.sender == "Node B":
        send (VOTE, "Y") to Node B
    // A and B each think this node voted for their preferred value

Equivocation is the hardest Byzantine behavior to detect and the one that most protocols are designed to prevent. It is also the behavior that requires 3f+1 nodes to overcome: with 2f+1 nodes and f Byzantine nodes equivocating, honest nodes cannot determine which of two conflicting claims is supported by a true majority.

The Cost Comparison

PropertyCrash ToleranceByzantine Tolerance
Nodes for f faults2f+13f+1
Messages per decisionO(N)O(N^2) typical
Latency (rounds)2 (common case)3-4 (common case)
Crypto requiredOptional (TLS for transport)Essential (signatures on every message)
Implementation complexityModerateHigh
Typical throughput10K-100K decisions/sec1K-10K decisions/sec
Code complexity~1000 lines (Raft)~5000-15000 lines (PBFT)

The 3f+1 requirement alone is significant. To tolerate 1 fault, you need 4 nodes instead of 3. To tolerate 2, you need 7 instead of 5. In a geo-distributed system where each “node” is a datacenter, the infrastructure cost is substantial.

When Do You Actually Need BFT?

This is the question that separates pragmatists from purists. The answer depends on your threat model.

You Probably Need BFT If:

Your nodes are operated by different organizations that do not trust each other. This is the blockchain use case. If node operators have financial incentive to cheat, crash-fault tolerance is insufficient. A rational adversary will not just crash — it will equivocate, fork the chain, or double-spend.

Your system processes financial transactions across trust boundaries. Cross-organizational payment systems, clearing houses, and settlement systems operate in an environment where any participant might try to cheat. BFT ensures that no coalition of up to f participants can forge or alter transactions.

Regulatory or safety requirements mandate it. Some aviation and medical systems require Byzantine fault tolerance because the consequences of a single corrupted node are catastrophic. Aircraft flight control systems, for example, use Byzantine agreement among redundant flight computers.

Your system is exposed to active adversaries. If an attacker can compromise some (but not all) of your nodes, BFT prevents the compromised nodes from corrupting the system’s decisions. This is relevant for military systems, some critical infrastructure, and high-value targets.

You Probably Do Not Need BFT If:

All your nodes run the same software on hardware you control. If you are running a 5-node etcd cluster in your own datacenter, the primary threats are hardware failure and software bugs. Hardware failures manifest as crash failures. Software bugs affect all nodes simultaneously (because they all run the same code), which BFT does not help with anyway — if all nodes have the same bug, 3f+1 nodes with f+1 buggy ones all exhibit the same wrong behavior.

Your performance requirements are tight. BFT’s overhead — extra nodes, extra messages, cryptographic operations — is real. If you need sub-millisecond consensus latency or tens of thousands of decisions per second, BFT protocols may not meet your requirements.

Your failure model is actually crash-recovery. Most datacenter failures are crashes, network partitions, and transient errors — all of which are handled by crash-fault-tolerant protocols without BFT’s overhead. Using BFT “just in case” is like wearing a radiation suit to the grocery store: technically safer, but the cost-benefit analysis does not work out.

Your system already has a trusted component. If all nodes share a trusted hardware module (TPM, SGX enclave) or a trusted external service, you can use that trust to simplify the protocol. For example, if nodes can attest their software version via a TPM, you can rule out software-based Byzantine behavior and use a simpler crash-fault protocol.

The Gray Area: Non-Malicious Byzantine Faults

There is an awkward middle ground: what about Byzantine faults that are not caused by adversaries?

  • Bit flips from cosmic rays (real, but extremely rare: ~1 bit flip per GB of RAM per year in space, less at sea level)
  • Firmware bugs in disk controllers that silently corrupt data (real, and documented by studies from CERN and others)
  • Network equipment that corrupts packets in ways not caught by TCP checksums (rare but documented)
  • Software bugs that cause different behavior on different nodes (common during rolling upgrades)

These are technically Byzantine failures — the affected node sends incorrect data — but they are not adversarial. A compromised flight computer does not know it is sending wrong data.

For most systems, the practical defense against non-malicious Byzantine faults is not BFT. It is:

  • ECC memory (corrects single-bit errors)
  • Checksums at the application level (detects data corruption)
  • Crash on inconsistency (turn a Byzantine fault into a crash fault)
  • Testing and code review (prevent software bugs)

These defenses are cheaper and more effective than BFT for the non-adversarial case. Use BFT when the threat is adversarial, not when it is accidental.

PBFT: The Practical Byzantine Protocol

The OM(m) algorithm is exponential and impractical. The protocol that made BFT feasible for real systems is PBFT (Practical Byzantine Fault Tolerance) by Castro and Liskov, published in 1999. We will cover PBFT in detail later in the book, but a brief sketch is relevant here.

PBFT achieves O(N^2) message complexity (compared to OM’s exponential) through a three-phase protocol:

// PBFT normal case (simplified)
// N = 3f + 1 nodes, one is the primary (leader)

procedure PBFT_NORMAL_CASE(client_request):
    // Phase 0: Client sends request to primary
    primary receives request R

    // Phase 1: PRE-PREPARE
    primary assigns sequence number n to R
    primary broadcasts <PRE-PREPARE, view, n, R> to all replicas
    // Signed by primary

    // Phase 2: PREPARE
    each replica i (that accepts the pre-prepare):
        broadcasts <PREPARE, view, n, digest(R), i> to all replicas
        // Signed by i

    // A replica has "prepared" when it has:
    //   - The pre-prepare
    //   - 2f matching PREPARE messages from different replicas
    // (2f prepares + 1 pre-prepare = 2f+1 out of 3f+1 nodes agree)

    // Phase 3: COMMIT
    each replica i (that has prepared):
        broadcasts <COMMIT, view, n, i> to all replicas
        // Signed by i

    // A replica has "committed" when it has:
    //   - 2f+1 matching COMMIT messages (including its own)
    // The request is now committed and can be executed

    // Phase 4: REPLY
    each replica sends result to client
    client accepts result when it has f+1 matching replies

The three phases ensure that:

  1. PRE-PREPARE: The primary assigns an order to requests.
  2. PREPARE: Replicas agree on the order (prevents a Byzantine primary from assigning different orders to different replicas).
  3. COMMIT: Replicas confirm that enough others have prepared (ensures the decision survives view changes).

The O(N^2) message complexity comes from the all-to-all broadcasts in PREPARE and COMMIT phases. Each of the N nodes sends a message to each of the N-1 others, twice (for PREPARE and COMMIT). Total: ~2N^2 messages per consensus decision.

For N=4 (f=1), that is 32 messages per decision. For N=7 (f=2), it is 98. For N=100 (f=33), it is 20,000. The N^2 scaling is why BFT systems typically run with small replica counts (4-7 nodes), not the hundreds or thousands common in crash-fault systems.

Modern BFT protocols (HotStuff, 2019; Narwhal/Bullshark, 2022) reduce this to O(N) messages using threshold signatures and leader-based communication patterns, at the cost of additional cryptographic assumptions.

The Real Cost of Byzantine Tolerance

Let us be concrete about what BFT costs in practice.

Throughput

A well-optimized Raft implementation (crash-fault tolerant) can achieve:

  • 50,000-100,000 operations per second
  • 1-5ms latency per operation
  • On a 5-node cluster with commodity hardware

A well-optimized PBFT implementation (Byzantine-fault tolerant) achieves:

  • 5,000-20,000 operations per second
  • 5-20ms latency per operation
  • On a 4-node cluster with commodity hardware

That is a 5-10x throughput reduction and a 4-5x latency increase. Modern BFT protocols (HotStuff, Tendermint) narrow the gap, but there is still a meaningful overhead.

Operational Complexity

BFT protocols require:

  • Key management for all nodes (private keys, certificates)
  • Secure channels between all pairs of nodes
  • View change protocols that are significantly more complex than Raft’s leader election
  • Client-side logic to collect and verify f+1 matching responses

Each of these is a source of operational overhead and potential bugs.

The Irony of Byzantine Tolerance

Here is the uncomfortable truth that BFT researchers acknowledge in private but rarely write in papers: the biggest threat to a BFT system is usually a bug in the BFT implementation itself.

A PBFT implementation is roughly 5-15x more code than a Raft implementation. More code means more bugs. A bug in the BFT protocol implementation can cause all nodes to fail in the same way (a correlated failure), which no amount of replication can tolerate. The protocol tolerates f arbitrary failures out of 3f+1 nodes, but a universal implementation bug is an f=N failure.

This is not hypothetical. Multiple blockchain systems have had consensus bugs that affected all validators simultaneously. The BFT protocol was working perfectly — it was the code implementing it that was wrong.

This does not mean BFT is useless. It means that BFT is a defense against heterogeneous failures — failures that affect some nodes differently than others. For homogeneous failures (same bug on all nodes), you need diverse implementations, N-version programming, or other orthogonal defenses.

Historical Context and Legacy

The 1982 paper by Lamport, Shostak, and Pease is one of the most cited papers in computer science. Its contributions are:

  1. Formalizing the problem. Before this paper, “fault tolerance” was vague. After it, we had precise definitions and provable bounds.

  2. The 3f+1 lower bound. The impossibility proof for N <= 3f (without signatures) is clean and surprising. Many people’s intuition says 2f+1 should suffice (it does for crash failures). The gap between 2f+1 and 3f+1 is the price of distrust.

  3. The signed messages result. Showing that digital signatures reduce the bound to f+2 anticipated the importance of cryptography in distributed systems by decades.

  4. The recursive algorithm. OM(m) is impractical but proves that the problem is solvable. This existence proof motivated decades of work on practical BFT protocols.

The paper also has its share of hand-waving, to be fair. The timing model is synchronous (messages are delivered within a known bound), which is unrealistic for most systems. The extension to asynchronous or partially synchronous models required another 17 years of research (culminating in PBFT in 1999). The message complexity bounds are exponential, which the paper acknowledges but does not resolve. And the treatment of digital signatures assumes a perfect signature scheme, ignoring key management, revocation, and the possibility of key compromise.

These are not criticisms — the paper solved the fundamental problem and left the engineering to others. But they are worth noting because the gap between the paper’s model and reality is where production bugs live.

Summary

The Byzantine Generals Problem establishes the fundamental costs of tolerating arbitrary failures:

  • Without signatures: You need 3f+1 nodes to tolerate f Byzantine faults. No fewer will suffice.
  • With signatures: You need f+2 nodes, but practical protocols still use 3f+1 for efficiency.
  • The cost is real: More nodes, more messages, more latency, more code, more operational complexity.
  • Most systems do not need it: If your nodes are trusted and your failure model is crash-recovery, crash-fault tolerance (2f+1 nodes) is sufficient and significantly cheaper.

The decision to use or not use BFT is an engineering tradeoff, not a religious one. Understand what Byzantine failures look like in your environment, assess the probability and cost of those failures, and compare that against the overhead of BFT. For most internal systems, the answer is “crash-fault tolerance is sufficient.” For cross-organizational systems, multi-tenant systems with untrusted participants, and high-value targets, BFT earns its keep.

In Part II of this book, we will study the protocols that solve consensus under both crash and Byzantine failure models. You will see that the theoretical constraints from this chapter — 2f+1 for crash, 3f+1 for Byzantine, the FLP impossibility, the role of timeouts and randomization — manifest directly in every design choice those protocols make. The theory is not separate from the practice; it is the practice, expressed in mathematical language.