Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

HotStuff and the Linear BFT Revolution

PBFT gave us practical Byzantine fault tolerance. Then it gave us O(n^2) message complexity per consensus decision and a view-change protocol so complicated that getting it right became a rite of passage in systems research. For twenty years, the BFT community tried to do better. Many protocols improved specific aspects — SBFT reduced the common-case latency, Zyzzyva optimized for the optimistic case, Tendermint simplified the structure — but the quadratic communication bottleneck and the horrifying view-change complexity persisted like a load-bearing wall that nobody could figure out how to remove.

Then, in 2018, Maofan Yin, Dahlia Malkhi, Michael K. Reiter, Guy Golan Gueta, and Ittai Abraham published HotStuff. The key insight was disarmingly simple in retrospect: use threshold signatures to aggregate votes, reducing per-phase communication from O(n^2) to O(n), and unify the normal case and view change into a single protocol path. The result was a BFT protocol with linear message complexity per view, a view-change protocol that is literally the same as the normal protocol, and enough elegance that Facebook (now Meta) chose it as the basis for their Libra/Diem blockchain’s consensus layer.

Of course, “disarmingly simple” means “took twenty years and some of the best minds in distributed systems to figure out.”

The Core Problem HotStuff Solves

Let’s be precise about what was wrong with PBFT’s communication pattern. In PBFT’s prepare and commit phases, every replica sends a message to every other replica. Each replica independently collects messages and determines when a quorum has been reached. This all-to-all communication pattern has two consequences:

  1. Quadratic message count. Each phase generates O(n^2) messages. With n = 100 replicas, that’s ~10,000 messages per phase, two phases of quadratic communication per decision. The network saturates.

  2. Complex view changes. Because every replica independently collects its own view of the quorum, the state that must be transferred during a view change is complex. Each replica has its own set of collected certificates, and the new leader must reconcile these. This is why PBFT’s view-change protocol is so intricate.

HotStuff’s insight: what if the leader collected all the votes and produced a single, compact proof that a quorum was reached? Instead of every replica telling every other replica “I voted,” every replica tells the leader “I voted,” and the leader combines these votes into a single quorum certificate (QC) using a threshold signature. Then the leader broadcasts this single QC to all replicas. Each phase goes from O(n^2) to O(n).

Threshold Signatures: The Enabling Technology

Before diving into HotStuff’s protocol, we need to understand threshold signatures, because they’re the cryptographic primitive that makes linear complexity possible.

A (k, n) threshold signature scheme allows any k out of n parties to collaboratively produce a signature that can be verified with a single public key. No individual party can produce a valid signature alone. The critical properties are:

  • Aggregation. Individual signature shares from different parties can be combined into a single signature of constant size, regardless of how many parties contributed.
  • Verification. The combined signature can be verified with a single public key verification, regardless of n.
  • Threshold. At least k shares are needed to produce a valid combined signature.

For BFT with n = 3f + 1, we set k = 2f + 1. A quorum certificate is a threshold signature on a message, proving that at least 2f + 1 replicas signed it.

Common threshold signature schemes include BLS (Boneh-Lynn-Shacham) signatures, which have a natural aggregation property, and threshold RSA or ECDSA schemes that require more complex setup.

The cost is non-trivial: BLS signature verification is significantly more expensive than standard signature verification (pairing operations on elliptic curves), and the DKG (distributed key generation) setup requires its own protocol. HotStuff’s authors acknowledged this tradeoff — you’re trading communication complexity for cryptographic computation. Whether this is a net win depends on your deployment: in a WAN setting where network latency dominates, it almost always is. In a LAN setting with fast networks and many small messages, the crypto overhead might hurt.

The HotStuff Protocol

HotStuff operates in a sequence of views, each with a designated leader. The protocol proceeds in three phases: prepare, pre-commit, and commit, followed by a decide step. Each phase follows the same star-shaped communication pattern: replicas send votes to the leader, the leader aggregates them into a QC, and broadcasts the QC with the next proposal.

The Crucial Abstraction: Generic HotStuff

Before presenting the specific phases, let’s look at the generic framework. HotStuff’s elegance comes from recognizing that each phase has the same structure:

// Generic phase at the leader
function generic_phase(proposal, phase_name):
    // Step 1: Leader broadcasts proposal (with QC from previous phase)
    broadcast(Proposal{
        phase:    phase_name,
        node:     proposal,
        qc:       highest_qc,
        view:     current_view,
        leader:   my_id
    })

    // Step 2: Collect votes from replicas
    votes = {}
    while count(votes) < 2 * f + 1:
        vote = receive_vote()
        if verify_vote(vote) and vote.view == current_view:
            votes[vote.sender] = vote.partial_sig

    // Step 3: Aggregate into QC
    qc = threshold_combine(votes)
    return qc
// Generic phase at a replica
function generic_vote(proposal):
    // Verify the proposal
    if not verify_proposal(proposal):
        return

    if not safe_to_vote(proposal):
        return

    // Send vote (partial threshold signature) to leader
    partial_sig = threshold_sign(proposal.node)
    send_to_leader(Vote{
        view:        current_view,
        node:        proposal.node,
        partial_sig: partial_sig,
        sender:      my_id
    })

Every phase is: leader proposes with a QC, replicas vote, leader aggregates into a new QC. Three iterations of this pattern, and you have consensus. Compare this with PBFT’s three distinct phase structures with different message formats, different quorum rules, and a completely different view-change protocol.

The Three Phases in Detail

Now let’s be specific about what each phase accomplishes and why three phases are necessary.

// ============ PREPARE PHASE ============
// Leader proposes a new block/command
function leader_prepare():
    // Create proposal extending the highest known QC
    proposal = create_node(
        parent:   highest_qc.node,
        command:  next_client_command(),
        qc:       highest_qc    // "justify" this proposal
    )

    msg = PrepareMsg{
        view:     current_view,
        node:     proposal,
        justify:  highest_qc
    }
    broadcast(msg)

function replica_on_prepare(msg):
    // Safety check: does this proposal extend from a safe branch?
    if not safe_node(msg.node, msg.justify):
        return  // Refuse to vote

    // Liveness check: is this from the current view's leader?
    if msg.view != current_view:
        return

    if leader_of(msg.view) != msg.sender:
        return

    // Vote
    partial_sig = threshold_sign(msg.node)
    send_to_leader(Vote{PREPARE, msg.view, msg.node, partial_sig})

// ============ PRE-COMMIT PHASE ============
// Leader has prepareQC, broadcasts it
function leader_pre_commit(prepareQC):
    msg = PreCommitMsg{
        view:     current_view,
        justify:  prepareQC    // Proof that 2f+1 voted to prepare
    }
    broadcast(msg)

function replica_on_pre_commit(msg):
    // Verify the QC
    if not verify_qc(msg.justify):
        return

    // Update locked QC — this is the safety-critical step
    // pre_commit_qc locks the replica on this branch
    locked_qc = msg.justify  // Lock on the prepareQC

    partial_sig = threshold_sign(msg.justify.node)
    send_to_leader(Vote{PRE_COMMIT, msg.view, msg.justify.node, partial_sig})

// ============ COMMIT PHASE ============
// Leader has precommitQC, broadcasts it
function leader_commit(precommitQC):
    msg = CommitMsg{
        view:     current_view,
        justify:  precommitQC
    }
    broadcast(msg)

function replica_on_commit(msg):
    if not verify_qc(msg.justify):
        return

    partial_sig = threshold_sign(msg.justify.node)
    send_to_leader(Vote{COMMIT, msg.view, msg.justify.node, partial_sig})

// ============ DECIDE ============
// Leader has commitQC — consensus reached!
function leader_decide(commitQC):
    msg = DecideMsg{
        view:     current_view,
        justify:  commitQC
    }
    broadcast(msg)

function replica_on_decide(msg):
    if not verify_qc(msg.justify):
        return

    // Execute the committed command
    execute(msg.justify.node.command)
    // Respond to client
    send_reply_to_client(msg.justify.node)

Message Flow: Normal Case

With n = 4 (f = 1):

Client      Leader       Replica 1    Replica 2    Replica 3
  |            |             |             |             |
  |--REQUEST-->|             |             |             |
  |            |             |             |             |
  |            |---PREPARE (proposal + highQC)---------->|
  |            |---PREPARE-->|             |             |
  |            |---PREPARE----------->|    |             |
  |            |             |             |             |
  |            |<--vote------|             |             |
  |            |<--vote-------------------|             |
  |            |<--vote---------------------------->    |
  |            |             |             |             |
  |            | (aggregate into prepareQC)              |
  |            |             |             |             |
  |            |---PRE-COMMIT (prepareQC)--------------->|
  |            |---PRE-COMMIT>|            |             |
  |            |---PRE-COMMIT-------->|    |             |
  |            |             |             |             |
  |            |<--vote------|             |             |
  |            |<--vote-------------------|             |
  |            |<--vote---------------------------->    |
  |            |             |             |             |
  |            | (aggregate into precommitQC)            |
  |            |             |             |             |
  |            |---COMMIT (precommitQC)----------------->|
  |            |---COMMIT--->|             |             |
  |            |---COMMIT----------->|     |             |
  |            |             |             |             |
  |            |<--vote------|             |             |
  |            |<--vote-------------------|             |
  |            |<--vote---------------------------->    |
  |            |             |             |             |
  |            | (aggregate into commitQC)               |
  |            |             |             |             |
  |            |---DECIDE (commitQC)----->|             |
  |            |---DECIDE----------->|     |             |
  |            |---DECIDE---------------------------->  |
  |            |             |             |             |
  |<--REPLY----|             |             |             |

Total messages per decision: 3 * (n + n) + n = 7n. For n = 4, that’s 28 messages. Compare with PBFT’s ~32 messages for n = 4. The savings aren’t dramatic at small n. But at n = 100: HotStuff sends ~700 messages versus PBFT’s ~20,000. That’s where linear versus quadratic matters.

The Safety Rule: When Is It Safe to Vote?

The safe_node function is the heart of HotStuff’s safety argument. A replica votes for a proposal if:

function safe_node(node, justify_qc):
    // Safety condition: the proposal must either extend the branch
    // we're locked on, OR the justify QC is from a higher view
    // than our locked QC (proving the system has moved on)

    extends_locked = is_ancestor(locked_qc.node, node)
    higher_qc = justify_qc.view > locked_qc.view

    return extends_locked or higher_qc

This is the locking mechanism. During the pre-commit phase, a replica “locks” on the prepareQC. It will only vote for future proposals that either:

  1. Extend the locked branch — the new proposal builds on the locked node, so there’s no conflict.
  2. Have a higher QC — someone else got a more recent quorum certificate, proving the system has progressed past what we locked on, so it’s safe to unlock.

This two-part rule is what allows HotStuff to be both safe and live. The lock prevents conflicting commits (safety), and the ability to unlock with a higher QC prevents deadlock (liveness).

View Changes: The Beautiful Part

Here’s where HotStuff earns its elegance. In PBFT, the view-change protocol is a completely separate, complex protocol with its own message types, its own quorum logic, and its own verification procedures. In HotStuff, the view change is… the same protocol.

function on_view_timeout():
    // View change triggered by timeout
    // Send our highest QC to the new leader
    new_leader = leader_of(current_view + 1)

    msg = NewView{
        view:       current_view + 1,
        highest_qc: my_highest_qc,
        sender:     my_id
    }
    send(new_leader, msg)
    current_view += 1

// New leader collects NEW-VIEW messages
function leader_on_new_view():
    new_view_msgs = {}
    while count(new_view_msgs) < 2 * f + 1:
        msg = receive_new_view()
        if verify(msg) and msg.view == current_view:
            new_view_msgs[msg.sender] = msg

    // Pick the highest QC from the collected messages
    highest_qc = max(msg.highest_qc for msg in new_view_msgs.values(),
                     key=lambda qc: qc.view)

    // Now just run the normal protocol, extending from highest_qc
    // This IS the view change. That's it.
    leader_prepare()  // Uses highest_qc as the justify

That’s it. The view change consists of: (1) replicas send their highest QC to the new leader, (2) the new leader picks the highest one, and (3) the new leader runs the normal prepare phase using that QC.

Compare this with PBFT’s view change, which requires assembling prepared certificates from 2f + 1 replicas, computing the O set of re-proposals for every sequence number in the watermark window, broadcasting the entire set for verification, and having every backup independently recompute the O set to verify the new leader isn’t cheating.

The reason HotStuff can get away with this simpler view change is the three-phase structure with the locking mechanism. The locked QC carried in the NEW-VIEW messages provides enough information for the new leader to determine the safe point to extend from. The safe_node rule at replicas ensures they won’t vote for anything that conflicts with a committed decision.

This unification of normal case and view change is, in my opinion, HotStuff’s most important contribution. It doesn’t just reduce complexity — it eliminates an entire class of bugs. Every PBFT implementation I’ve seen has had view-change bugs that didn’t exist in the normal-case code, because the view-change code was tested less and was fundamentally more complex. With HotStuff, if the normal case works, the view change works.

Chained HotStuff: Pipelining Consensus

Basic HotStuff requires three phases (plus decide) for each consensus decision. Chained HotStuff observes that these phases are independent across different proposals and can be pipelined.

The key insight: instead of running prepare/pre-commit/commit sequentially for one proposal and then starting the next, each new proposal effectively advances the previous proposals through their phases.

// Chained HotStuff: each proposal serves double duty
function leader_propose_chained():
    // Create a new proposal
    node = create_node(
        parent:  highest_qc.node,
        command: next_command(),
        qc:      highest_qc
    )

    broadcast(Proposal{view: current_view, node: node})

    // This proposal simultaneously:
    // - Starts PREPARE for the new command
    // - Acts as PRE-COMMIT for the parent (1 phase ago)
    // - Acts as COMMIT for the grandparent (2 phases ago)
    // - Triggers DECIDE for the great-grandparent (3 phases ago)

function replica_on_proposal_chained(msg):
    if not safe_node(msg.node, msg.node.qc):
        return

    // Update locked QC if the parent's QC is higher
    if msg.node.qc.view > locked_qc.view:
        locked_qc = msg.node.qc

    // Check for commits:
    // If node.qc.node.qc.node is the same chain, we can commit
    // the great-grandparent
    b_star = msg.node             // Current
    b_double = b_star.qc.node    // Parent (1-chain)
    b_single = b_double.qc.node  // Grandparent (2-chain)
    b = b_single.qc.node         // Great-grandparent (3-chain)

    // Three-chain commit rule: if b_single is the parent of b_double
    // and b_double is the parent of b_star, then b is committed
    if parent(b_single) == b_double and parent(b_double) == b_star:
        execute_up_to(b)

    // Vote for the current proposal
    partial_sig = threshold_sign(msg.node)
    send_to_leader(Vote{msg.view, msg.node, partial_sig})

Pipelining Visualization

View 1:   Leader proposes cmd1
           |
View 2:   Leader proposes cmd2 (carries QC for cmd1)
           cmd1 is now PREPARED (1-chain)
           |
View 3:   Leader proposes cmd3 (carries QC for cmd2)
           cmd2 is now PREPARED (1-chain)
           cmd1 is now PRE-COMMITTED (2-chain)
           |
View 4:   Leader proposes cmd4 (carries QC for cmd3)
           cmd3 is now PREPARED (1-chain)
           cmd2 is now PRE-COMMITTED (2-chain)
           cmd1 is now COMMITTED (3-chain) → execute cmd1!
           |
View 5:   Leader proposes cmd5 (carries QC for cmd4)
           cmd4 is PREPARED
           cmd3 is PRE-COMMITTED
           cmd2 is COMMITTED → execute cmd2!

In steady state, Chained HotStuff commits one command per view with only one round of communication per view (leader broadcasts, replicas respond). The latency to commit a specific command is still three views, but the throughput is one command per view.

This pipelining is elegant, but there’s a catch the paper mentions briefly: if a leader is faulty and doesn’t produce a valid QC, the pipeline stalls. A view change means the current view’s proposal doesn’t get a QC, which means the previous proposals don’t advance through their phases. After three consecutive leader failures, you’re three views behind on commits. In a network with Byzantine participants actively trying to disrupt progress, this can significantly impact throughput. The pacemaker (discussed below) is supposed to handle this, but doing so efficiently is harder than it sounds.

The Pacemaker: Liveness Without Synchrony Assumptions

HotStuff’s protocol provides safety regardless of timing. But liveness — actually making progress — requires some form of synchronization to ensure enough replicas are in the same view at the same time, talking to the same leader.

The pacemaker is the component responsible for this. It’s explicitly separated from the safety protocol, which is a clean design choice. The paper specifies properties the pacemaker must satisfy but leaves the implementation somewhat open. Here’s one common approach:

// Pacemaker: ensures replicas eventually synchronize on the same view
function pacemaker():
    // Start a timer for the current view
    timer = start_timer(timeout_for_view(current_view))

    while true:
        if received_valid_proposal(current_view):
            // Good — leader is alive, reset timer
            reset_timer(timer)

        if timer_expired():
            // Leader seems faulty, initiate view change
            broadcast_timeout_certificate()
            advance_view()

function timeout_for_view(v):
    // Exponential backoff to handle cascading failures
    return BASE_TIMEOUT * 2^(consecutive_timeouts)

function advance_view():
    // Collect timeout certificates from 2f+1 replicas
    // proving the view should change
    tc = collect_timeout_certificates(current_view)

    if count(tc) >= 2 * f + 1:
        current_view += 1
        consecutive_timeouts += 1
        // Send NEW-VIEW to next leader with highest QC
        send_new_view(leader_of(current_view))

function on_successful_commit():
    // Reset backoff on successful progress
    consecutive_timeouts = 0

The pacemaker is where the “eventual synchrony” assumption lives. In a purely asynchronous network, the pacemaker might never synchronize replicas, and the system might never make progress. The assumption is that eventually, the network becomes synchronous enough for the pacemaker to align replicas on the same view with a correct leader.

The Responsiveness Property

HotStuff has a property called optimistic responsiveness: in the normal case (correct leader, no faults), the protocol proceeds at the speed of the network, not at the speed of a predetermined timeout. The leader waits for 2f + 1 votes and immediately proceeds — it doesn’t wait for a timer to expire.

This matters in practice because networks have variable latency. A protocol that proceeds at “actual network speed” will outperform one that waits for a conservative timeout. PBFT also has this property in its normal case, but PBFT’s view-change protocol does not — it relies on timeouts to detect a faulty primary, and during the view change, progress is gated by timeout expiry.

HotStuff’s unified view change inherits responsiveness: the new leader waits for 2f + 1 NEW-VIEW messages and immediately proceeds. No timeout-gated phases during recovery. This means recovery from a faulty leader is as fast as the network allows, not as slow as the most conservative timeout.

PBFT vs HotStuff: Direct Comparison

Let’s compare them head-to-head.

Message Complexity

MetricPBFTHotStuff
Messages per phase (normal)O(n^2)O(n)
Phases per decision2 quadratic + 1 linear3 linear
Total messages per decisionO(n^2)O(n)
View change messagesO(n^3) worst caseO(n)
Authenticator complexityO(n^2) MACs or O(n) signaturesO(n) partial sigs + O(1) threshold sig

Latency

MetricPBFTHotStuff
Network round trips (normal)3 (pre-prepare, prepare, commit)3 (prepare, pre-commit, commit) + decide
Network round trips (with pipelining)N/A in base protocol1 per decision (Chained HotStuff, steady state)
View change round trips2+ (VIEW-CHANGE, NEW-VIEW)1 (NEW-VIEW, then normal protocol)
Crypto operations per replica per decisionO(n) MAC or sig verificationsO(n) partial sig verifications + threshold combine

Throughput

Replicas (n)PBFT (ops/sec, est.)HotStuff (ops/sec, est.)Ratio
480,00060,0000.75x
1620,00040,0002.0x
642,00025,00012.5x
128< 50015,00030x+

Note: these numbers are approximate and depend heavily on implementation quality, hardware, network conditions, batch size, and signature scheme. The point is the trend: at small n, PBFT’s simpler crypto can win. At larger n, HotStuff’s linear communication dominates.

Complexity of Implementation

AspectPBFTHotStuff
Normal case protocolModerateSimple
View change protocolVery complexSame as normal case
Cryptographic setupStandard PKIThreshold key setup (DKG)
State managementComplex watermark/checkpointSimpler chain-based
Lines of code (typical impl)5,000 - 15,0003,000 - 8,000
View change bugs in practiceCommonRare (it’s the same code)

The Tradeoff

HotStuff is not strictly better than PBFT. The tradeoffs:

  1. Crypto overhead. Threshold signatures (especially BLS) are computationally expensive. A BLS pairing operation takes ~1-2ms, compared to ~0.05ms for an Ed25519 verification. For small n where network overhead isn’t the bottleneck, PBFT with MACs can be faster.

  2. Leader bottleneck. In HotStuff, the leader processes all votes and aggregates them. It’s a star topology, and the leader does more work than any other replica. PBFT’s all-to-all communication distributes the work more evenly. A Byzantine leader in HotStuff can selectively delay aggregation to degrade performance, and replicas can’t easily detect this until the timeout fires.

  3. Latency. Both protocols have three-phase latency in the normal case. HotStuff’s decide step adds a fourth message delay compared to PBFT (which doesn’t have an explicit decide broadcast — replicas commit independently after collecting the commit certificate). Chained HotStuff amortizes this with pipelining, but individual request latency is still three views.

  4. DKG requirement. Setting up threshold signatures requires a distributed key generation protocol, which is itself a multi-round protocol that can be disrupted by Byzantine participants. PBFT just needs a standard PKI. This makes HotStuff harder to bootstrap and harder to handle key rotation.

LibraBFT / DiemBFT: HotStuff in Production

Facebook’s Libra project (later renamed Diem, later shut down) chose HotStuff as the basis for their consensus protocol. LibraBFT (later DiemBFT) made several practical modifications:

Key Modifications from Base HotStuff

  1. Explicit timeout certificates. DiemBFT added timeout certificates (TCs) as first-class objects. When 2f + 1 replicas timeout, they form a TC that proves the view should change. This gives a concrete mechanism for the pacemaker.

  2. Two-chain commit rule. DiemBFT v4 modified the commit rule to require only a two-chain (two consecutive QCs) instead of HotStuff’s three-chain. This reduces commit latency from 3 round trips to 2 at the cost of a more complex safety argument. The trick involves using the timeout certificates to prove safety — if a view times out, the TC provides evidence that allows safe unlocking.

  3. Decoupled execution. DiemBFT separates consensus (ordering) from execution. Blocks are ordered by consensus but executed asynchronously. This allows consensus to proceed at network speed while execution happens in the background.

  4. Reputation-based leader selection. Instead of round-robin leader rotation, DiemBFT uses a reputation mechanism: leaders who produce blocks and respond promptly get selected more often. Leaders who fail to produce blocks get deprioritized. This helps the pacemaker converge on good leaders faster.

// DiemBFT leader selection with reputation
function select_leader(view):
    // Build reputation scores from recent history
    scores = {}
    for replica in all_replicas:
        scores[replica] = base_score

        // Reward for producing blocks
        blocks_produced = count_blocks_by(replica, recent_window)
        scores[replica] += blocks_produced * PRODUCE_WEIGHT

        // Penalize for timeouts (failed to lead)
        timeouts_caused = count_timeouts_by(replica, recent_window)
        scores[replica] -= timeouts_caused * TIMEOUT_PENALTY

    // Deterministic selection based on scores and view number
    // (all replicas compute the same result)
    sorted_replicas = sort_by_score(scores)
    return sorted_replicas[view % len(sorted_replicas)]

Performance Results from DiemBFT

The DiemBFT team published benchmarks showing:

ConfigurationThroughput (TPS)Latency (ms)Network
n = 4, LAN160,000< 110 Gbps
n = 33, LAN80,0002-510 Gbps
n = 100, LAN30,00010-2010 Gbps
n = 10, WAN5,000300-500Global

These numbers include batching and pipelining. Without batching, throughput drops by 10-50x, similar to PBFT.

The Diem project was shut down in 2022 for regulatory rather than technical reasons. The HotStuff-based consensus code lives on in the Aptos blockchain (which hired many of the Diem engineers) and in various other projects that adopted or adapted the protocol.

Why Three Phases and Not Two?

A question that comes up frequently: PBFT has three phases, HotStuff has three phases, but CFT protocols like Raft get by with essentially two phases (leader proposes, followers accept). Why does BFT always seem to need three?

The answer is the commit-availability dilemma. In BFT, there’s a fundamental tension:

  • Safety requires that once a value is committed, no conflicting value can ever be committed, even across view changes.
  • Liveness requires that after a view change, the new leader can propose a new value if the old leader’s proposal didn’t complete.

With two phases, you can have safety or liveness across view changes, but not both. Here’s the intuitive argument:

After the first phase (prepare), a quorum has voted for a value. After the second phase (commit), a quorum has confirmed they know about the first quorum.

  • If you commit after one quorum (two phases), the new leader after a view change might not know about the commitment (because only the committing replicas know, and they might not be in the new leader’s quorum). You can fix this by requiring the new leader to learn about it — but then the new leader is blocked until it hears from enough replicas, which kills liveness.

  • The third phase (which creates a QC certifying the second-phase QC) ensures that enough replicas are “locked” on the committed value that any quorum the new leader contacts will contain at least one locked replica. This locked replica will inform the new leader, who can then safely re-propose the committed value. Without the third phase, the lock isn’t strong enough.

Some protocols (including DiemBFT v4) achieve two-phase commits by exploiting additional information (like timeout certificates), but the fundamental tension remains and the safety argument becomes more subtle.

Criticisms and Limitations

HotStuff is a significant advance, but it’s not without limitations:

  1. Leader centrality. Every message goes through the leader. The leader is a single point of performance — if it’s slow, the whole system is slow. A Byzantine leader can selectively censor transactions by not including them in proposals. Detection is possible but delayed.

  2. Threshold signature setup. DKG is complex and requires its own fault tolerance. If the DKG is compromised, the threshold signature scheme fails, and HotStuff’s aggregation doesn’t work. This is a bootstrapping problem that the paper waves a hand at.

  3. Chaining rigidity. In Chained HotStuff, a leader failure doesn’t just lose one view’s proposal — it stalls the pipeline for the proposals that were in earlier phases. Three consecutive leader failures mean the pipeline is empty and three views of proposals are lost. Recovery involves filling the pipeline again, adding latency.

  4. Still O(n) per view. Linear is better than quadratic, but for very large n (thousands of nodes), O(n) per view is still significant. Some newer protocols aim for sub-linear communication using sampling or committee-based approaches, though they introduce additional assumptions.

  5. Practical crypto challenges. BLS signatures on common curves (BLS12-381) have verification times around 1-2ms. With batching, you can use aggregate verification (~2ms for verifying n signatures at once), but the leader’s aggregation step becomes a bottleneck. EdDSA-based threshold schemes are faster but less mature.

Where HotStuff Fits

HotStuff is the right choice when:

  • You need BFT with more than ~20 replicas.
  • You can invest in threshold signature infrastructure.
  • View-change correctness is a priority (it should always be, but here it comes for free).
  • You’re building a blockchain or permissioned network with known validators.

HotStuff is less ideal when:

  • You have fewer than 10 replicas and network bandwidth isn’t a concern. PBFT may be simpler to deploy (no DKG).
  • You need leaderless operation. HotStuff is inherently leader-based.
  • Threshold signature infrastructure is unavailable or too expensive to set up.
  • You need sub-second latency in a WAN setting. Three round trips across continents add up.

The protocol’s lasting contribution isn’t just the linear complexity — it’s the demonstration that normal-case operation and view changes can be unified into a single, clean protocol structure. Every BFT protocol designed after HotStuff has to explain why it’s not just using HotStuff’s framework. That’s the mark of a good idea.