Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

What To Actually Do

You have read eleven chapters about a problem that the industry mostly does not think about, in code bases mostly not written with exception safety in mind, in languages mostly without good tooling for it. You presumably cannot rewrite your codebase in Rust tomorrow. You may not even be able to convince your team to read this book. What should you actually do?

This chapter is the practical answer. A short list of disciplines, ordered from “highest impact and lowest cost” to “high impact but harder to deploy.” If you do the first three, you have already moved your code-base ahead of 90% of production code.

1. Internalize the vocabulary

This is the cheapest thing on the list and the highest-leverage. You don’t have to write any code. You don’t have to deploy any tooling. You just have to be able to say which guarantee a function provides when you read or write it, with the words “no-throw,” “basic,” “strong,” “no guarantee.”

The discipline shows up in code review as a question: what does this function leave behind if it throws partway through? Once you train yourself to ask this on every PR, you find bugs.

void Cache::evict_lru() {
    auto victim = lru_list_.back();   // (1)
    storage_.erase(victim.key);       // (2) might throw
    lru_list_.pop_back();             // (3)
    --size_;                          // (4)
}

What does this function leave behind if (2) throws? victim is constructed (a copy of the back element). storage_ may have partially mutated. lru_list_ is unchanged. size_ is unchanged.

If storage_.erase provides the strong guarantee (it does, in std::unordered_map), then on its throw, nothing changed in storage_. The cache is consistent — same as before, no eviction happened. This function provides the strong guarantee. Good.

If storage_.erase provided only the basic guarantee, the cache could be left with storage_ partially modified, but lru_list_ and size_ unchanged. The invariant “every key in storage_ is in lru_list_” might be violated. This function would provide only the basic guarantee. Then the question is: is that good enough? Sometimes yes. Sometimes no.

The vocabulary forces you to ask. The answer is what you wanted; the asking is what was missing.

2. Order operations: throw first, mutate last

The single most useful pattern in this entire book is: if you have a sequence of operations where some can throw and some mutate observable state, put all the throwing operations first.

// BAD
balance_ -= amount;
notify_audit_log(amount);    // might throw
target.balance_ += amount;

// GOOD
notify_audit_log(amount);    // might throw — happens first
balance_ -= amount;
target.balance_ += amount;

Two things to verify:

  1. The throwing operation does not depend on the post-mutation state. (If notify_audit_log needs to know the new balance, you can’t reorder; you’ll need a side-copy approach.)
  2. The post-throw mutations do not throw. (If they do, the discipline is recursive: those throwing operations also need to come first, which may not be possible.)

When this discipline applies, it gives you the strong guarantee for free. When it doesn’t, you fall back to scope guards or side-copy patterns. This is the 80% rule: 80% of functions can be made strong-guarantee with simple reordering.

3. Use RAII (or its language’s equivalent) without exceptions

Every resource — memory, file handle, lock, network connection, database transaction — should be wrapped in an object whose destructor releases it. Or, in non-RAII languages, in a with / using / try-with-resources / defer block.

This is not optional. Code that does not use RAII for resource cleanup is broken on exception paths, full stop. There is no excuse for raw new/delete, raw lock/unlock, raw open/close in any modern language.

In code review:

  • Any raw pointer that owns memory: change to unique_ptr / shared_ptr / equivalent.
  • Any explicit unlock after a lock: change to lock_guard / scoped_lock / equivalent.
  • Any explicit close after open: change to with / using / try-with-resources.
  • Any explicit cleanup after a step: scope guard / defer.

This catches the resource-leak class of bug entirely. It does not catch the invariant-violation class, which is what the rest of the disciplines address.

4. Make destructors no-throw

Or, in languages without destructors, make scope-exit cleanups no-throw.

A destructor that throws during stack unwinding from another exception terminates the program. If your destructor does anything beyond pointer cleanup — releases a lock, closes a file, sends a message — it might throw. Wrap in try {} catch (...) {}, log the failure, swallow.

~RemoteTransaction() {
    try {
        if (!committed_) rollback();
    } catch (...) {
        local_log_.warn("rollback during destruction failed");
    }
}

This is genuinely ugly, and you are right to wince at it. It is necessary. The alternative is std::terminate in production, with no recovery, in a code path that may rarely fire and won’t be exercised in tests.

In Rust, Drop implementations cannot use ? and panicking from Drop is a known antipattern; the equivalent discipline is to log and continue.

5. Make swap no-throw, and use it

For any non-trivial type, define a swap member function that is noexcept. This is the building block for copy-and-swap, and more generally for any pattern where you need to commit a side-built result into place. Standard-library types already do this; user-defined types often do not.

class Widget {
public:
    void swap(Widget& other) noexcept {
        using std::swap;
        swap(p_, other.p_);
        swap(state_, other.state_);
    }
};

void swap(Widget& a, Widget& b) noexcept { a.swap(b); }

The cost is two functions and a discipline. The benefit is that you have a no-throw primitive available for any of the strong-guarantee patterns from chapter 4.

6. For each public mutating function, document the guarantee

In the function’s documentation comment (or in a code-review checklist applied to every public function), declare:

  • What invariants the function preserves.
  • What guarantee it provides under throw.
  • What the caller must do if the function throws.

Example:

/// Insert a new element into the cache.
///
/// Strong guarantee: if this function throws, the cache is unchanged.
/// Throws:
///   - std::bad_alloc if memory allocation fails
///   - InvalidKey if `key` does not satisfy `is_valid_key(key)`
void Cache::insert(Key key, Value value);

This is contract, not commentary. If you change the function later in a way that downgrades the guarantee, the documentation is wrong, which is a visible signal in review. Without the documentation, the guarantee is lost in the noise.

The convention in the C++ Standard Library, since C++11, is to document this consistently. The convention in your code base should be the same.

7. Test exception paths explicitly

For any function with non-trivial throw behavior, write tests that exercise the throw path. Inject a controlled throw at a known location; verify the postcondition holds; verify the function’s invariants are preserved.

TEST(Cache, RollbackOnInsertFailure) {
    Cache c;
    c.insert("a", "1");
    c.insert("b", "2");

    // make next allocation fail
    set_alloc_failure_after(0);
    EXPECT_THROW(c.insert("c", "3"), std::bad_alloc);
    set_alloc_failure_after(-1);

    // cache must be in pre-insert state
    EXPECT_EQ(c.size(), 2);
    EXPECT_EQ(c.get("a"), "1");
    EXPECT_EQ(c.get("b"), "2");
    EXPECT_FALSE(c.contains("c"));
}

The infrastructure for “make the next allocation fail” is small (a thread-local counter that operator new checks). The result is that exception paths are tested, not just hoped about. Most exception-safety bugs ship because nobody ever ran the code. Exercising the path catches them.

Property-based testing with throw injection (chapter 10) generalizes this: instead of testing one specific throw point, test all of them. The combination is powerful.

8. In concurrent code, do throwing work outside critical sections

The single concurrency-specific discipline that buys the most safety:

void Cache::insert(const Key& k, const Value& v) {
    auto entry = make_entry(k, v);   // throwing work, no lock
    std::lock_guard lock(mu_);
    storage_[k] = std::move(entry);  // no-throw mutations under lock
    metadata_.note_insert(k);        // no-throw
}

The principle is to push throwing operations before the lock acquisition, and ensure that everything inside the critical section is no-throw or at most basic-guarantee with rollback. The result is that the lock’s released state is always consistent.

When you cannot do this — read-modify-write that needs the protected state — use atomic compare-exchange or accept the basic guarantee with explicit rollback.

9. At system boundaries, prefer idempotency over transactions

For any operation that crosses a network or system boundary, design the operation so it can be retried safely. Two-phase commit at the in-process level is achievable; two-phase commit across systems is hard, expensive, and frequently does not actually work.

The technique:

  • Each operation has an idempotency key.
  • The receiving system records the key on first receipt and ignores duplicates.
  • The sender retries on failure.

This converts “the strong guarantee across the network” into “at-least-once delivery with idempotency-based deduplication,” which is achievable. Stripe’s API famously does this; AWS’s request signing includes idempotency tokens; the pattern is widespread because the alternative is “distributed transactions,” which usually don’t.

Not exception safety in the C++ sense, but exception safety in the system sense: the fact that an operation can be retried without compounding effects is what makes the higher-level “either it happened or it didn’t” actually true.

10. Adopt at least one tool

Pick one and put it in CI:

  • C++: ASan + TSan + UBSan + libFuzzer for the main code; clang-tidy for style and obvious bugs.
  • Java: SpotBugs, ErrorProne. Run them.
  • Rust: cargo clippy (already there). Take warnings seriously.
  • Solidity: Slither. Non-negotiable.
  • Python: mypy for type errors; property-based testing with Hypothesis for invariant violations.
  • Go: golangci-lint with errcheck and ineffassign.

These tools do not catch “the strong guarantee was violated”; they catch the downstream effects of exception-safety bugs (use-after-free, double-free, leaked resources). Catching those gets you most of the way, even though the abstraction is wrong.

11. Code review with a checklist

For PRs that touch mutating code, the reviewer asks:

  1. Does this function mutate state across multiple steps?
  2. Are any of those steps throwing operations?
  3. Are the throwing operations before or after the mutations?
  4. If after, is there a rollback or is the function only providing the basic guarantee?
  5. Is the basic guarantee sufficient for this function’s contract?

If the reviewer cannot answer these questions from reading the code, the PR needs a comment that documents the guarantee, or the code needs restructuring to make it obvious.

This is procedural, not technical. It works if you do it consistently and does not work if you don’t.

12. Recognize the same problem in disguise

Every time you encounter a “weird race condition,” a “cache consistency issue,” a “deployment failure mode,” or a “multi-step API call that left things in a strange state,” ask: is this exception safety with a different mechanism? It usually is.

The fix patterns are the same:

  • Order mutations after throwing/transferring operations.
  • Use a no-throw primitive for the commit step.
  • Roll back on failure with a scope guard or compensating action.
  • Make operations idempotent so retries are safe.
  • Make the partial-state period invisible (locks, transactions, copy-on-write).

Once you recognize the shape, the abstraction transfers across domains. The effort you put into thinking carefully about exception safety in one language pays off the next time you encounter a saga, a smart contract, a signal handler, or a cancelable async future.

A two-sentence summary

Exception safety is the discipline of writing code that is correct even when control flow leaves your function unexpectedly. The mechanism that causes the unexpected exit doesn’t matter — throw, panic, abort, external call, lock release, signal, page fault, network failure — and the patterns to handle it don’t either. RAII for cleanup, two-phase commit for atomicity, and the strong guarantee where it counts.

If you do only the throw-first-mutate-last reordering, the consistent RAII, and the documentation of guarantees on public mutators, you will have done more for the correctness of your code base than 95% of the production code in the world.

A note on humility

I want to close with a note. After eleven chapters and twelve years of thinking about this problem, I am still occasionally surprised by it. A function I would have sworn was strong-guarantee turns out to provide only the basic, because of a noexcept annotation I forgot to read on a member type. A reentrant call I did not see, because I assumed a callback was synchronous and it wasn’t. A signal handler that interacts with stdio in a way nobody noticed in code review.

The problem is genuinely subtle. The patterns help; the patterns are not magic. The discipline is what gets you most of the way, and the discipline is itself fragile because you have to apply it to every function, every line, every change, forever, while also shipping code on time.

I do not fully grok this. You should not fully grok it after one read. The people who appear to fully grok it have, in my experience, had it bite them enough times to develop a defensive crouch toward all mutating code, which is a state I recommend cultivating.

Good luck.

Further reading

  • Herb Sutter, “Exception-Safe Class Design,” Parts 1–3, C/C++ Users Journal, 2002. The clearest practical synthesis of the patterns.
  • Bjarne Stroustrup, “C++ Core Guidelines,” section E (Error handling). https://isocpp.github.io/CppCoreGuidelines/CppCoreGuidelines#S-errors
  • Practical Common Lisp, Peter Seibel, especially chapters 19–20. For the alternative model.
  • Pat Helland, “Standing on Distributed Shoulders of Giants,” ACM Queue 2016. For the system-level perspective.
  • The book ends here. The discipline begins now.