BFT in Blockchains vs BFT Everywhere Else

Byzantine fault tolerance had a quiet three decades. From Lamport, Shostak, and Pease’s original 1982 paper through the early 2010s, BFT was a respected but niche area of distributed systems research. Conferences published papers. Graduate students wrote dissertations. A handful of production systems used it. Then Satoshi Nakamoto published a white paper, and suddenly everyone needed to know about Byzantine fault tolerance.

The irony is rich. Nakamoto consensus — proof of work — is arguably the least efficient BFT protocol ever deployed in production. It burns electricity equivalent to small countries, takes minutes to hours for probabilistic finality, and processes single-digit transactions per second. But it solved a problem that classical BFT protocols couldn’t: open membership. Anyone can join. No one needs permission. No distributed key generation ceremony, no known validator set, no upfront coordination. That property was so valuable that the world was willing to pay an enormous price for it.

This chapter examines the relationship between BFT and blockchains, the fundamental differences between permissioned and permissionless BFT, and the question that every non-blockchain engineer eventually asks: “Do I actually need Byzantine fault tolerance?”

The Fundamental Divide: Open vs Closed Membership

The single most important distinction in BFT protocols is membership: who gets to participate?

Closed Membership (Permissioned BFT)

Classical BFT protocols — PBFT, HotStuff, Tendermint — all assume a known, fixed set of participants. You know who the replicas are, you have their public keys, and you can count them to determine quorum sizes.

Properties of closed-membership BFT:

n is known. You can compute 3f + 1 and set quorum sizes precisely.
Identity is established. Every message is signed by a known party.
Sybil resistance is free. An attacker can’t create fake identities because membership is controlled.
Communication is bounded. You know who to send messages to and how many responses to expect.
Finality is deterministic. Once 2f + 1 replicas commit, the decision is final. No probabilistic hand-waving.

Open Membership (Permissionless BFT)

Nakamoto-style consensus and its descendants allow anyone to participate. You don’t know how many participants there are, you don’t know who they are, and you can’t trust any identity because identities are free.

Properties of open-membership BFT:

n is unknown. You can’t compute quorum sizes in the traditional sense.
Identity is cheap. Creating new identities (Sybil attack) is trivial without some costly resource.
Communication is unbounded. You can’t send messages to “all replicas” because you don’t know who they are.
Finality is probabilistic. Decisions become “more final” over time but are never absolutely irrevocable (in pure Nakamoto consensus).
Participation requires proof of resource. To prevent Sybil attacks, participants must prove they’ve expended some scarce resource: computation (proof of work), capital (proof of stake), storage (proof of space), etc.

This divide is not a spectrum — it’s a categorical difference that shapes every aspect of protocol design:

Aspect	Permissioned BFT	Permissionless BFT
Membership	Known, fixed (or controlled changes)	Open, anyone can join/leave
Identity	Established via PKI or out-of-band	Pseudonymous or anonymous
Sybil resistance	Membership control	Proof of resource expenditure
Quorum definition	2f + 1 out of 3f + 1 known nodes	Longest chain / most accumulated weight
Finality	Deterministic, immediate	Probabilistic, grows over time
Throughput	1K - 100K+ TPS	3-20 TPS (PoW); higher with PoS
Latency to finality	Milliseconds to seconds	Minutes to hours (PoW); seconds (PoS)
Fault tolerance	Up to f < n/3 Byzantine	Up to 50% hash power (PoW); varies
Energy cost	Negligible	Enormous (PoW); negligible (PoS)

Nakamoto Consensus: BFT by Another Name

Let’s be precise about what Bitcoin’s proof-of-work consensus actually provides, because it’s surprisingly subtle.

The Protocol

// Nakamoto consensus (simplified)
function mine():
    while true:
        // Select transactions from mempool
        txs = select_transactions()

        // Build block extending the longest chain
        parent = tip_of_longest_chain()
        block = Block{
            parent_hash: hash(parent),
            transactions: txs,
            timestamp: now(),
            nonce: 0
        }

        // Find a nonce that makes the block hash below the target
        while hash(block) > difficulty_target:
            block.nonce += 1

        // Found a valid block!
        broadcast(block)
        add_to_chain(block)

function on_receive_block(block):
    if not validate_block(block):
        reject(block)
        return

    add_to_chain(block)

    // Fork choice rule: always follow the longest chain
    // (actually: chain with most accumulated proof of work)
    if accumulated_work(block.chain) > accumulated_work(current_tip.chain):
        switch_to_chain(block.chain)
        // This might REVERT previously accepted blocks!

The BFT Properties

Nakamoto consensus provides:

Safety (probabilistic). The probability of a committed transaction being reversed decreases exponentially with the number of confirmation blocks. After 6 confirmations (~60 minutes for Bitcoin), reversal requires an attacker with >50% of the network’s hash power.
Liveness (probabilistic). As long as honest miners control >50% of hash power, new blocks will eventually be produced and transactions will eventually be included.
Censorship resistance. No single party can prevent a valid transaction from eventually being included, as long as honest miners will include it.

The fault tolerance bound is different from classical BFT: 50% of hash power rather than 33% of nodes. This is because Nakamoto consensus doesn’t require all honest parties to communicate — it uses the chain itself as the communication medium. The cost is probabilistic finality and much lower throughput.

Why Computer Scientists Were Skeptical

When Bitcoin first appeared, many distributed systems researchers were dismissive. The throughput was laughable (7 TPS), the latency was absurd (60 minutes for reasonable safety), and the energy consumption was unconscionable. By the metrics that classical BFT cared about, Nakamoto consensus was a terrible protocol.

What the researchers initially missed was that Bitcoin optimized for a different metric: permissionless participation. The ability to join the network without anyone’s permission, to mine blocks without establishing identity, and to transact without trusting any specific party was novel and, for certain applications, worth the enormous cost.

The subsequent decade of blockchain research has been, in many ways, an attempt to get the permissionless property without paying the Nakamoto tax. Proof of stake, committee-based BFT, and various hybrid approaches all try to shrink the gap between permissioned BFT’s performance and permissionless BFT’s openness.

Why Most Non-Blockchain Systems Don’t Need BFT

Here’s the argument that should be your default: if you’re not building a blockchain or a system with mutually distrusting operators, you probably don’t need BFT. Let me justify this.

The Threat Model Argument

BFT protects against Byzantine faults: nodes that behave arbitrarily, including lying, equivocating, sending contradictory messages to different peers, and actively trying to sabotage the protocol. For BFT to be worth its cost, the Byzantine threat must be realistic.

In a typical enterprise deployment:

You control all the nodes. They run your software, on your infrastructure, managed by your team. A “Byzantine” node in this context means either a bug or a security compromise.
Bugs are usually crash faults. Most software bugs cause crashes, hangs, or incorrect output that’s detectable (wrong format, invalid values). Truly Byzantine bugs — where a node produces valid-looking but incorrect output that other nodes can’t distinguish from correct behavior — are rare.
Security compromises are all-or-nothing. If an attacker compromises one node in your cluster, they likely have (or will soon have) access to the others, because they share infrastructure, credentials, and access patterns. BFT with f = 1 doesn’t help if the attacker can compromise all nodes.
The cost is significant. BFT requires 3f + 1 nodes instead of 2f + 1 (50% more). Message complexity is higher. Latency is higher. The implementation is more complex, meaning more bugs, meaning more operational burden.

For these reasons, the vast majority of production systems use crash fault tolerant (CFT) consensus: Raft, Multi-Paxos, Zab, or similar. Google’s Spanner, Amazon’s DynamoDB, Apache Kafka, etcd, CockroachDB, TiKV — all CFT.

The Performance Gap

Let’s quantify what BFT costs compared to CFT:

Metric	CFT (Raft)	BFT (PBFT, n=4)	BFT (PBFT, n=7)	Overhead
Minimum nodes (f=1)	3	4	—	+33%
Minimum nodes (f=2)	5	7	7	+40%
Messages per decision	O(n)	O(n^2)	O(n^2)	Quadratic
Throughput (typical, LAN)	100K+ ops/s	50-80K ops/s	30-60K ops/s	2-5x lower
Latency (typical, LAN)	<1ms	1-3ms	2-5ms	2-5x higher
Crypto overhead per msg	None or HMAC	Signature verify	Signature verify	Significant
Implementation complexity	Moderate	High	High	2-3x more code
Testing difficulty	Moderate	Very high	Very high	Need Byzantine fault injection

For most workloads, paying a 2-5x performance penalty and significantly higher complexity to protect against a threat that doesn’t materially apply is a poor engineering decision.

When People Think They Need BFT But Don’t

Common scenarios where teams consider BFT but probably shouldn’t:

“Our nodes might have bugs.” Yes, but CFT already handles the most common bug manifestation (crashes). For non-crash bugs, invest in testing, monitoring, and detection rather than BFT. A system that detects and alerts on Byzantine behavior is cheaper and more practical than one that tolerates it.
“We run in multiple clouds.” Multi-cloud deployment protects against cloud provider failures (a form of partition), not against Byzantine behavior. Use CFT with replicas spread across providers.
“We don’t trust our partners’ software.” If you’re integrating with partners, the trust boundary is at the API level, not the consensus level. Use contract validation, cryptographic signatures on data, and audit logs rather than BFT consensus.
“We need regulatory compliance.” Regulators care about auditability, data integrity, and availability — not the specific fault tolerance model of your consensus protocol. A CFT system with proper audit logging meets regulatory requirements.

When You Actually Need BFT

Having argued against BFT for most cases, let me now make the case for it. There are real scenarios where Byzantine fault tolerance is appropriate outside of public blockchains.

Multi-Party Computation with Untrusted Participants

When multiple organizations need to jointly compute something — a financial settlement, a supply chain verification, a collaborative analysis — and no single organization trusts the others to operate honestly, BFT consensus provides guarantees that CFT cannot.

Example: Multi-bank settlement system.

Bank A         Bank B         Bank C         Bank D
(runs node)    (runs node)    (runs node)    (runs node)
   |              |              |              |
   | (each bank submits transactions)           |
   | (BFT consensus orders them)                |
   | (all banks execute the same order)         |
   | (any bank can verify the computation)      |

In this scenario:

Each bank controls its own node. A compromise of Bank A’s node shouldn’t affect the system’s correctness.
Banks don’t trust each other not to submit conflicting transactions or attempt to double-spend.
The consensus protocol must be correct even if one bank’s node is actively trying to cheat.
CFT would be insufficient: if Bank A’s node crashes in Raft, it just loses its vote. But if Bank A’s node is Byzantine in Raft, it could equivocate and cause inconsistency that Raft can’t detect.

This is the permissioned blockchain use case, and it’s legitimate. Hyperledger Fabric, R3 Corda, and similar projects target this space.

Financial Trading Systems

High-frequency trading systems sometimes use replicated state machines for order matching. When the participants include potentially adversarial traders, BFT prevents a compromised matching engine replica from manipulating order execution.

Supply Chain with Untrusted Participants

Multiple companies in a supply chain — manufacturers, shippers, retailers — need to track goods. If any participant can unilaterally alter the shared record, the system is meaningless. BFT ensures that the shared record is correct even if some participants misbehave.

Critical Infrastructure with Defense-in-Depth

Some safety-critical systems (aviation, nuclear, medical) use BFT not because they expect adversarial behavior but as defense in depth. If a hardware fault causes a node to produce arbitrary outputs (e.g., a bit flip in memory that changes a control signal), BFT ensures the system continues correctly. This is the original motivation for Byzantine fault tolerance from the 1980s, predating blockchains by decades.

Multi-Cloud with Genuine Distrust

This is different from “we run in multiple clouds for availability.” This is: “we run in multiple clouds because we don’t trust any single cloud provider to not be compromised or compelled to tamper with our computation.” Government agencies, organizations handling classified data, and some financial institutions have this genuine concern. BFT across cloud providers ensures that a compromised provider can’t unilaterally affect the computation.

The Cost-Benefit Analysis

Here’s a framework for deciding whether BFT is warranted:

function should_use_bft():
    // Question 1: Is Byzantine behavior a realistic threat?
    byzantine_threat =
        operators_are_mutually_distrusting OR
        nodes_run_different_software_stacks OR
        nodes_are_in_different_security_domains OR
        compromise_of_one_node_is_independent_of_others

    if not byzantine_threat:
        return NO  // Use CFT

    // Question 2: Is the cost acceptable?
    n_required = 3 * f + 1  // vs 2 * f + 1 for CFT
    performance_overhead = 2x to 10x  // vs CFT
    implementation_complexity = HIGH
    operational_complexity = HIGH

    cost_acceptable =
        n_required is feasible AND
        performance_overhead is tolerable AND
        team_has_bft_expertise

    if not cost_acceptable:
        return MAYBE_USE_DETECTION_INSTEAD
        // Monitor for Byzantine behavior, alert, and handle manually

    // Question 3: Is there a simpler alternative?
    simpler_alternative =
        can_use_cryptographic_signatures_on_data OR
        can_use_audit_logs_with_detection OR
        can_use_trusted_hardware (SGX, etc.) OR
        can_restructure_to_avoid_shared_state

    if simpler_alternative:
        return PROBABLY_NO  // Use the simpler thing

    return YES  // BFT is warranted

Most paths through this decision tree lead to “no.” That’s intentional. BFT is expensive and complex; the bar for using it should be high.

Hybrid Approaches

The binary choice between CFT and BFT is a false dichotomy. Several hybrid approaches exist:

BFT for Ordering, CFT for Execution

Use BFT consensus to agree on the order of operations, then execute those operations on trusted infrastructure using simpler protocols. This is essentially what Hyperledger Fabric v2 does: an ordering service provides BFT ordering, but the peer nodes that execute transactions use simpler endorsement policies.

Detect-and-Recover Instead of Tolerate

Instead of tolerating Byzantine faults in real-time (which requires 3f + 1 nodes), detect them after the fact and recover:

// Byzantine detection approach
function detect_and_recover():
    // All nodes execute and sign their results
    for node in nodes:
        result[node] = node.execute(operation)
        sig[node] = node.sign(result[node])

    // Verify agreement
    if all_results_match(result):
        return result[0]  // All good

    // Disagreement detected — identify Byzantine node
    majority_result = find_majority(result)
    for node in nodes:
        if result[node] != majority_result:
            flag_as_byzantine(node)
            // Alert, investigate, replace

    return majority_result

This approach works when:

Immediate tolerance isn’t required (you can afford brief incorrect behavior).
Detection is sufficient deterrent (the Byzantine node faces consequences).
Recovery is feasible (you can replace or patch the faulty node).

Many practical systems use this approach: they run with CFT consensus but add cryptographic auditing to detect Byzantine behavior retroactively.

Trusted Execution Environments (TEEs)

Hardware enclaves like Intel SGX, AMD SEV, or ARM TrustZone can provide integrity guarantees at the hardware level. If you trust the hardware, a node running in a TEE can’t produce Byzantine outputs (ignoring side-channel attacks and hardware bugs, which is a big “if”).

Using TEEs, you can potentially run CFT consensus with BFT-like guarantees:

Approach	Nodes Required (f=1)	Performance	Assumptions
Pure CFT	3	Highest	Crash faults only
CFT + TEE	3	High (TEE overhead)	Trust hardware vendor
Pure BFT	4	Moderate	f < n/3 Byzantine
BFT + TEE	3	Moderate (TEE overhead)	Can reduce n; trust hardware

The TEE approach has been used in several systems, notably Microsoft’s CCF (Confidential Consortium Framework), which uses SGX enclaves to provide BFT-like guarantees with CFT-like node counts.

The catch: trusting hardware is a strong assumption. SGX has had multiple vulnerability disclosures (Foreshadow, Plundervolt, AEPIC). Whether hardware trust is more or less reasonable than trusting your replicas to be non-Byzantine depends on your threat model.

Optimistic BFT

Run the system optimistically assuming no Byzantine faults (essentially CFT performance). If a Byzantine fault is detected, fall back to the full BFT protocol.

// Optimistic BFT (simplified Zyzzyva-style)
function optimistic_commit(request):
    // Fast path: all replicas respond with the same result
    results = broadcast_and_collect(request)

    if all_match(results) and count(results) == 3 * f + 1:
        // All replicas agree — commit immediately
        // One network round trip!
        return results[0]

    else:
        // Disagreement or missing responses
        // Fall back to full BFT protocol (PBFT-like)
        return slow_path_bft(request)

The optimistic path gives CFT-like performance (one round trip) when all nodes behave correctly. The slow path provides safety when Byzantine faults occur. The downside: the slow path is more complex and slower than standard BFT because it needs to handle the transition from optimistic to pessimistic mode.

BFT Protocol Selection Guide

Given the landscape, here’s a practical guide:

For Permissionless Blockchains

Need	Recommendation	Why
Maximum decentralization	Nakamoto (PoW)	No identity required; proven at scale
Better performance	PoS with BFT finality	Ethereum 2.0 (Casper FFG), Cosmos (Tendermint)
Throughput focus	Committee-based BFT	Algorand, Solana (modified)
Simple smart contracts	Established L1	Use existing ecosystem, don’t build consensus

For Permissioned Blockchains

Need	Recommendation	Why
< 20 validators	PBFT or variant	Simple crypto, well-understood
20-200 validators	Tendermint/CometBFT	Production-tested, ecosystem
> 200 validators	HotStuff variant	Linear complexity necessary
Maximum throughput	HotStuff + pipelining	Chained HotStuff or DiemBFT

For Non-Blockchain Systems

Need	Recommendation	Why
Trusted operators	CFT (Raft, Paxos)	BFT overhead not warranted
Untrusted operators, < 20 nodes	PBFT	Simple, well-understood
Untrusted operators, > 20 nodes	HotStuff	Linear complexity
Hardware trust available	CFT + TEE	Fewer nodes, good performance
Detection sufficient	CFT + audit	Simplest solution with accountability

Comparison Table: BFT Protocols Across Domains

Protocol	Domain	Membership	Fault Tolerance	Finality	Throughput	Latency
PBFT	General BFT	Closed	f < n/3	Deterministic	10K-80K ops/s	1-10ms
HotStuff	General BFT	Closed	f < n/3	Deterministic	10K-100K+ ops/s	5-20ms
Tendermint	Blockchain	Closed*	f < n/3	Deterministic	1K-10K TPS	1-7s
Raft	General CFT	Closed	f < n/2	Deterministic	100K+ ops/s	<1ms
Nakamoto PoW	Blockchain	Open	50% hashrate	Probabilistic	3-7 TPS	~60 min
Casper FFG	Blockchain	Open**	f < n/3 stake	Deterministic	Varies	~15 min
Algorand	Blockchain	Open	f < n/3 stake	Deterministic	1K+ TPS	~4s

*Tendermint’s validator set can change over time via governance, but at any given height, membership is known. **Casper FFG uses economic bonding to establish a known validator set from an open pool of potential validators.

Case Study: The Same Problem, Three Solutions

To make the BFT versus CFT versus blockchain decision concrete, consider a real scenario: three banks want to run a shared settlement system. They don’t fully trust each other. Each bank processes approximately 10,000 transactions per day that need to be jointly ordered and settled.

Solution A: Central Trusted Party + CFT

Appoint one bank (or a neutral third party) as the operator. Run a 3-node Raft cluster under their control. The other banks submit transactions via API and trust the operator’s system to be correct.

// Architecture: Central operator with CFT
Operator runs: 3-node Raft cluster
Bank A: submits transactions via authenticated API
Bank B: submits transactions via authenticated API
Bank C: reads results, verifies against own records

Fault tolerance: Crash faults in operator's cluster
Trust model: All banks trust the operator
Performance: 100K+ ops/sec, <1ms latency
Cost: 3 servers, standard ops team

Pros: Simple, fast, well-understood technology. Cons: Requires trusting the operator. If the operator is compromised or malicious, all bets are off. The other banks have no way to verify the operator didn’t reorder, drop, or fabricate transactions.

Solution B: Permissioned BFT

Each bank runs one (or more) BFT replicas. Use PBFT or a similar protocol with n = 4 (one per bank plus a tie-breaker, or one per bank if there are four banks).

// Architecture: Permissioned BFT
Bank A runs: 1 PBFT replica
Bank B runs: 1 PBFT replica
Bank C runs: 1 PBFT replica
Neutral party runs: 1 PBFT replica (or a 4th bank)

Fault tolerance: 1 Byzantine fault (f=1, n=4)
Trust model: Any 1 party can be fully Byzantine
Performance: 50K-80K ops/sec, 1-5ms latency (LAN)
Cost: 4 servers across 4 organizations, BFT expertise needed

Pros: No single party needs to be trusted. Any one bank can be compromised without affecting correctness. Every bank can independently verify the settlement log. Cons: Requires BFT expertise. Higher latency if banks are geographically distributed. More complex operations (4 organizations coordinating software upgrades, key rotation, etc.).

Solution C: Blockchain

Deploy a permissioned blockchain (e.g., Hyperledger Fabric, or a Cosmos chain with the three banks as validators).

// Architecture: Permissioned blockchain
Bank A runs: Validator node + application
Bank B runs: Validator node + application
Bank C runs: Validator node + application

Consensus: Tendermint-based (3 validators, can tolerate 0 Byzantine!)
// Wait — with n=3, f must be 0 for 3f+1. That's not useful.
// Need n=4 for f=1. Add a 4th validator.

Fault tolerance: Same as Solution B
Trust model: Same as Solution B
Performance: 1K-10K TPS, 1-7s block time
Cost: 4 servers, blockchain expertise, smart contract development

Pros: Gets the BFT guarantees plus an ecosystem of tools (explorers, wallets, smart contracts). Audit trail is built in. Cons: Significantly lower throughput than raw BFT. Block-based latency. Requires blockchain-specific expertise in addition to distributed systems expertise. The “blockchain” label may help or hurt politically depending on your organization.

The Verdict

For this scenario, Solution B is likely the best fit: the genuine distrust between banks justifies BFT, but the closed membership and moderate transaction volume don’t require blockchain infrastructure. Solution A is appropriate if the banks can agree on a trusted operator (they often can’t). Solution C is appropriate if the banks want the broader ecosystem features or plan to expand to many participants.

The point isn’t that one solution is universally better — it’s that the choice depends entirely on the trust model and operational requirements, and getting the trust model wrong means either paying for security you don’t need or not getting the security you do.

The Future: Convergence or Divergence?

The blockchain world and the distributed systems world have been on converging paths:

Blockchain is adopting classical BFT. Ethereum’s move to proof of stake with Casper FFG incorporates BFT finality. Cosmos was built on BFT from the start. Many newer blockchains use committee-based BFT.
Classical systems are adopting blockchain ideas. The concept of a verifiable, append-only log — the blockchain’s core data structure — has influenced systems like AWS QLDB, Hyperledger, and various audit-log systems. Even teams not using BFT consensus are using Merkle trees and hash chains for data integrity.
The lines are blurring. Tendermint is a “blockchain consensus protocol” used for non-blockchain applications. PBFT is a “classical BFT protocol” used in blockchain systems. HotStuff was published as an academic protocol and deployed in a blockchain. The protocol doesn’t care what you call the system it runs in.

What remains different is the deployment context. A 3-node Raft cluster in a single data center and a 100-validator Tendermint network spanning the globe face fundamentally different engineering challenges, even though both are solving the consensus problem. The choice between CFT and BFT, between permissioned and permissionless, between immediate and probabilistic finality — these are driven by the trust model and operational context, not by the protocol’s intrinsic properties.

The honest conclusion: for most systems, most of the time, crash fault tolerance is sufficient. When it’s not, the reason is almost always that the operators don’t trust each other — and that’s a social problem that technology can help with but not fully solve. BFT consensus gives you the ability to cooperate with parties you don’t trust. That’s a remarkable capability. Just make sure you actually need it before paying the price.

Keyboard shortcuts

The Agony of Consensus Algorithms