Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

The Reentrancy Connection

Here is the central thesis of this chapter: Ethereum smart-contract reentrancy attacks are exception-safety bugs. Not analogous to exception-safety bugs. Not metaphorically related. Structurally identical. The same shape of mistake, with the same fix. The only thing different is the costume.

If this seems like a stretch, stay with me. By the end of the chapter I want this to feel obvious.

The shape of the problem, abstracted

In any programming language with the following properties, the bug exists:

  1. There is a function F that mutates state across multiple steps.
  2. Between two of those steps, F calls another function G.
  3. G’s execution is, from F’s perspective, opaque — F does not know what G will do.
  4. G is capable of re-entering F’s context: calling back into F, observing F’s partially-mutated state, or transferring control such that the rest of F is not yet executed.

Substitute concrete things for F, G, and “re-enter”:

FG“re-enter”
C++ function with multi-step state updateA throwing operationThe throw skips the rest of F
Method holding a mutex while modifying shared stateCode that releases the lockAnother thread sees partial state
Smart contract functionAn external contract callThe external contract calls back into F
Signal handlerA non-async-signal-safe library callA second signal arrives
Database transactionApplication code that readsApplication reads partially-committed state

These are, again, not analogous. They are the same problem. In each case, F made a mistake by assuming control would return to it linearly after G returned, and that assumption was wrong. Whether the assumption was wrong because G threw, or because G blocked while another thread interfered, or because G was an external contract that called back, is a matter of detail. The structural error is the same.

A brief refresher: how Ethereum reentrancy works

Smart contracts on Ethereum are programs that hold balances of Ether (or tokens) and execute when called. A contract function may call another contract — and that other contract is, from the caller’s perspective, opaque, since contracts are deployed independently and the caller may not know what the callee does.

The classic vulnerable pattern, in Solidity:

contract Vulnerable {
    mapping(address => uint) public balances;

    function withdraw(uint amount) public {
        require(balances[msg.sender] >= amount);

        // (1) Send the Ether
        (bool success,) = msg.sender.call{value: amount}("");
        require(success);

        // (2) Update the balance
        balances[msg.sender] -= amount;
    }
}

The flaw: at step (1), Ether is sent to msg.sender. If msg.sender is a contract, that contract’s code runs as part of receiving the Ether (Ethereum allows this; receiver code runs synchronously within the sender’s transaction). That code can call withdraw again, recursively, before step (2) has run, which means balances[msg.sender] is still its original value. The check at the top still passes. Money flows out a second time. And a third. And until either the gas runs out, the source contract is drained, or the attacker chooses to stop.

The DAO hack of 2016 was exactly this. Approximately $50 million in Ether was drained from a contract that held investor funds. A community-divisive hard fork was performed to recover most of the funds, splitting the chain into Ethereum and Ethereum Classic. It is the most consequential exception-safety-shaped bug in the history of computing, by dollar value.

Why this is exception safety

Re-read the Solidity code with chapter 1’s vocabulary in mind.

withdraw is a function that mutates state across multiple steps. Between checking the balance and decrementing it, it makes an external call. The external call is opaque — withdraw does not know what msg.sender.call will do. The external call is capable of re-entering withdraw — by calling back into the same contract — and observing the partially-mutated state (specifically, the state where the balance has not yet been decremented).

This is the same mistake as Account::transfer_to from chapter 1, with one variable renamed:

void Account::transfer_to(Account& other, int amount) {
    balance_ -= amount;
    log_transfer(amount, &other);   // throws — control transfers out
    other.balance_ += amount;
}

In Account::transfer_to, control transferred out via throw. The state at that moment: balance_ decremented, other.balance_ not yet incremented. Whoever runs next sees inconsistent state.

In Vulnerable::withdraw, control transferred out via external call. The state at that moment: Ether sent, balances[msg.sender] not yet decremented. Whoever runs next — including a re-entrant call from the very contract being called — sees inconsistent state.

The same partial-mutation, the same “control unexpectedly leaves the function,” the same observable inconsistency. In one case the mechanism is throw, in the other it’s an external call. In one case the observer is the catch handler, in the other it’s a re-entering caller. The bug is the same.

The fix is the same too

Recall the fix for transfer_to from chapter 4: do all the throwing work first, then commit with no-throw operations.

void Account::transfer_to(Account& other, int amount) {
    int new_self = balance_ - amount;
    int new_other = other.balance_ + amount;
    log_transfer(amount, &other);     // throwing work first
    balance_ = new_self;              // no-throw commit
    other.balance_ = new_other;
}

Now look at the fix for the smart contract, by Solidity convention called “checks-effects-interactions”:

function withdraw(uint amount) public {
    // CHECKS
    require(balances[msg.sender] >= amount);

    // EFFECTS (commit state changes first)
    balances[msg.sender] -= amount;

    // INTERACTIONS (external calls last)
    (bool success,) = msg.sender.call{value: amount}("");
    require(success);
}

The order is: validate, then mutate state, then make external calls. By the time the external call runs, the state is fully consistent — if msg.sender’s code re-enters and reads balances, it sees the updated value. The re-entrant call’s require(balances[msg.sender] >= amount) will fail (or correctly succeed against the new, lower balance). No double-spending.

This is exactly the two-phase commit pattern from chapter 4. Compute and commit the no-throw mutations first; do the operations that may transfer control (throw, external call, send Ether) last. The order matters because what comes after the maybe-transferring operation may not happen, and the state must be self-consistent at the moment control could leave.

The Solidity community, having lost $50 million, codified “checks-effects-interactions” as a best practice. The C++ community, having dealt with exception safety for thirty years, codified “two-phase commit” as a best practice. Different communities, different vocabularies, identical pattern.

Other smart-contract incidents that are exception-safety bugs

The DAO is the famous one. Several others are worth knowing:

The Parity multisig wallet, 2017

A library contract used by many wallets had a function that, due to a missed access modifier, could be called by anyone. An attacker called it with arguments that re-initialized the library, then called another function that destroyed it, freezing approximately $300 million in Ether across all wallets that depended on it. The bug was not technically reentrancy, but the shape — a function leaves observable state in a configuration its callers did not expect — is the same family.

The bZx incidents, 2020

Multiple separate attacks on the bZx lending protocol exploited reentrancy-adjacent issues: a borrowed asset’s price could be manipulated by the borrower (using flash loans) before the loan-issuance code re-checked solvency. The attacker took advantage of the gap between “loan issued” and “solvency re-checked,” which is the same gap exception safety exists to close.

Cream Finance, 2021

A reentrancy attack on a lending protocol drained ~$130 million. The contract called a token’s transferFrom function before updating its internal accounting; the token (an attacker-deployed implementation of an ERC-20 variant called ERC-777, which has receiver hooks) called back into the lending contract during transfer, observed stale accounting, and borrowed against the same collateral repeatedly.

The pattern is: ERC-777 specifically allows token transfers to invoke receiver hooks. A contract written assuming ERC-20 semantics (transfers don’t trigger receiver code) is vulnerable when used with ERC-777 tokens. The shape of the assumption that broke is “the call I’m making won’t transfer control back to me.” This is the exception-safety assumption again, just at a different protocol layer.

The deeper insight: control-flow assumptions

Here is the abstraction worth carrying forward: every function has implicit assumptions about what the calls it makes will do with control.

In a “safe” call, control returns linearly to the caller, with the callee having done its job. The caller’s reasoning is local: “after this line, the state is X, because that’s what this function does.”

In an “unsafe” call, control may not return linearly:

  • It may not return at all (the callee throws, terminating up the stack).
  • It may return after re-entering the caller (the callee calls back, possibly calling the very function the caller is in).
  • It may be interrupted between the call and the return (a signal arrives, a thread is preempted, a transaction is aborted).

In all three cases, the caller’s local reasoning — “after this line, the state is X” — is wrong, because what came between the call and that line may have changed. The caller’s job is to write code that is correct under all of these possibilities, which means not assuming linear control flow across any non-trivial call.

This is exhausting to apply to every line of every function. So we develop discipline: the two-phase commit pattern, checks-effects-interactions, the strong guarantee. Each is a structural answer to “how do I make my code correct without proving the linearity of every call?”

The answer is always the same: finish your state mutations before you make a call whose effect on control flow you don’t fully understand.

A general checklist for “is this code reentrancy/exception-safe?”

For any function F, walk through:

  1. Identify every line where F calls something that might transfer control out of F and back in. This includes: any call that might throw; any call to external code, contracts, or callbacks; any call that releases a held lock or signals other threads; any call that yields in cooperative multitasking.

  2. For each such line, identify the state of F’s observable variables at that moment. “Observable” means visible to anything that could see the state during the period control is out of F: other threads, re-entrant callers, external observers.

  3. Ask: if control returns to F after some other code has run in between, is F’s state still consistent? Are the invariants still true?

  4. If not, restructure: either move the partial mutation after the transfer-of-control point (so the transfer sees pre-mutation state), or use a lock or guard to make the partial state invisible.

This checklist applies, unchanged, to:

  • C++ functions that throw
  • Smart contract functions that call external contracts
  • Concurrent functions that release locks
  • Signal handlers that call non-reentrant code
  • Database transactions that yield to other transactions

It is the same checklist. The reason exception-safety vocabulary is not used in the smart-contract world is that the smart-contract world inherited its vocabulary from a different family of disciplines — fault-tolerance, distributed systems, security — and arrived at the same place by a different road. The C++ world inherited its vocabulary from systems-programming-and-RAII. They are talking about the same thing.

What the smart-contract community got right that the C++ community didn’t

I want to give credit. The smart-contract community has done two things better than the C++ community in this area:

  1. Static analyzers for reentrancy are widely deployed. Tools like Slither, MythX, and Securify warn on potentially-reentrant patterns by default. They are not perfect, but they catch the obvious cases, and they’re run by default in many CI pipelines. The C++ analog — static analyzers for exception safety — is in nowhere near as healthy a state, as we’ll see in chapter 10.

  2. The “checks-effects-interactions” pattern is a part of basic Solidity education. Every introductory Solidity tutorial in 2024 covers it. By comparison, “the strong guarantee” is not part of basic C++ education at most universities, even in advanced courses. Engineers can finish a CS degree and a job interview without knowing the term.

The cost differential is part of why. A reentrancy bug in a smart contract costs eight figures, by the time the dust settles. An exception-safety bug in C++ costs a production incident and a post-mortem. Both are bad; one is more concentrated.

What the C++ community got right that the smart-contract community didn’t

In return:

  1. The strong guarantee is a sharper concept than checks-effects-interactions. “Checks-effects-interactions” is a pattern; “the strong guarantee” is a contract. The C++ vocabulary lets you say “this function provides the strong guarantee” as part of an API contract, with implications for callers. Solidity has nothing comparable. A Solidity function that follows checks-effects-interactions does not, by following it, expose any invariant to its callers — the caller has to inspect the implementation.

  2. The patterns generalize. Two-phase commit, scope guards, copy-and-swap apply to operations beyond the simple “external call last” case. The smart-contract community has the simple case well-handled but tends to flounder on more complex cases (multi-step operations with intermediate external calls, cross-contract atomicity).

If the two communities talked to each other more, both would benefit. The vocabulary on the C++ side is more precise; the tooling on the Solidity side is more deployed. Combining them would produce a better state of the art than either has on its own.

What this chapter wanted to leave you with

  1. Reentrancy is exception safety in another costume. Same problem, same fix, same reasoning patterns. The fact that the two communities developed independent vocabularies for it is a fact about software engineering’s limited self-awareness, not about the problem.

  2. Any function that calls something with non-trivial control-flow semantics is in the same situation. Whether the non-trivial semantics is “might throw,” “might recurse,” “might block,” or “yields to another fiber” is detail. The structural lesson is: finish your state mutations before you do that.

  3. The two-phase commit pattern is universal. You will see it again in databases, in distributed sagas, in lock-free programming, in signal handlers. Once you recognize it, the variations across domains are just dress.

The next chapter pulls more places out of the woodwork where this same problem hides under different names.

Further reading