The Common Lisp Condition System

I want to be careful in this chapter. Lisp partisans have been telling the rest of the industry that Lisp solved exception handling forty years ago, and the rest of the industry has been declining to listen, and the result has been a bad-faith argument on both sides. Lisp partisans usually overstate their case (the condition system is not a silver bullet, and the things it enables are usable but not always wanted). The non-Lisp world usually understates it (the condition system is not a fancy try/catch; it does things that try/catch cannot do, no matter how it’s dressed up).

What I’m going to argue here is the narrower claim: the Common Lisp condition system is strictly more expressive than any mainstream exception system, the additional expressiveness solves a specific class of exception-safety problem that other systems cannot, and the reason it is not more widely adopted is sociological rather than technical.

You do not have to like Lisp to learn from this chapter. You do have to be willing to read parenthesized code for ten minutes.

What `try`/`catch` actually is

Before showing what’s different, let’s nail down what mainstream exception handling does.

When you throw (in any of C++/Java/C#/Python/JavaScript), the call stack between the throw site and the catch site is unwound. The frames are gone. Whatever local state existed in those frames is destroyed (in C++, via destructors; in GC languages, made unreachable). Control resumes at the catch handler, which now stands in a fresh frame above whatever frame the catch was in.

This is terminating exception handling. The thrown error terminates everything between the throw and the catch.

Two consequences:

The handler cannot fix the problem at the site of the failure. It can only pick up the pieces at a higher level. If a parser at depth 12 of the call stack hits a malformed token and you’ve got the catch at depth 1, you can decide what to do at depth 1, but you cannot decide to keep parsing from depth 12 — frame 12 is gone.
The handler has no information beyond what the throw site put in the exception object. If the throw site forgot to include some piece of context, you can’t go back and ask. The frames that had that context are destroyed.

These are not limitations of any particular implementation; they are constitutive of what termination means. Every try/catch system in mainstream use is a termination system.

What the condition system adds

Common Lisp’s condition system separates signaling a condition from deciding what to do about it. When code signals, the stack is not immediately unwound. Instead, the runtime walks up the stack looking for handlers, calls each handler with the condition, and the handler can choose:

Ignore. Decline; the next handler up the stack gets a chance.
Handle by transferring control. This is the equivalent of catch: the handler unwinds the stack to its own frame and runs.
Handle by invoking a restart. This is the new thing. The handler can invoke a named recovery action that the signaling code registered at or below the signal point. The signaling frame is still alive — it has not been unwound. The restart runs in that frame, can use its locals, and returns a value to the signal site, after which execution continues normally.

That last option is what termination systems cannot do. The try/catch model destroys the signaling frame; the condition system preserves it, and lets recovery happen in place.

A concrete example

Here is a small example, in Common Lisp. We have a function that parses lines from a log file, with the option of letting the user fix bad lines and try again, skip the line, or use a default value.

(define-condition malformed-log-line (error)
  ((line :initarg :line :reader bad-line)))

(defun parse-line (line)
  (restart-case
      (if (well-formed-p line)
          (parse-it line)
          (error 'malformed-log-line :line line))
    (use-default ()
      :report "Use a default empty entry."
      (make-empty-entry))
    (skip-line ()
      :report "Skip this line and continue."
      nil)
    (use-value (new-line)
      :report "Use a corrected version of the line."
      :interactive (lambda () (list (read-line)))
      (parse-line new-line))))

(defun parse-log (file)
  (with-open-file (s file)
    (loop for line = (read-line s nil :eof)
          until (eq line :eof)
          for entry = (parse-line line)
          when entry collect entry)))

restart-case registers three named restarts: use-default, skip-line, and use-value. Each has a name and a handler body. When something signals 'malformed-log-line inside the dynamic extent of restart-case, the restarts are available but not yet invoked.

Now the caller can choose how to handle:

;; (1) Skip all malformed lines silently.
(handler-bind ((malformed-log-line
                (lambda (c)
                  (declare (ignore c))
                  (invoke-restart 'skip-line))))
  (parse-log "/var/log/messy.log"))

;; (2) Substitute a default for malformed lines.
(handler-bind ((malformed-log-line
                (lambda (c)
                  (declare (ignore c))
                  (invoke-restart 'use-default))))
  (parse-log "/var/log/messy.log"))

;; (3) Re-throw to the next handler up.
(handler-bind ((malformed-log-line
                (lambda (c)
                  (declare (ignore c))
                  ;; do nothing; next handler up will see it
                  )))
  (parse-log "/var/log/messy.log"))

;; (4) Drop into the debugger interactively.
(parse-log "/var/log/messy.log")
;; -> if no handler is bound, the user gets a prompt:
;;    "1: USE-DEFAULT — Use a default empty entry.
;;     2: SKIP-LINE   — Skip this line and continue.
;;     3: USE-VALUE   — Use a corrected version of the line."

Notice (1) through (3): the same parser code, with the handling policy set by the caller, declaratively, without modification to the parser. This is the equivalent of catching the exception — but the parser is not unwound. After invoke-restart 'skip-line fires, control returns to inside parse-line, which returns nil from its restart-case, and the loop in parse-log continues to the next line.

Notice (4): with no handler bound, the system drops into the debugger and asks the user, at runtime, which restart to invoke. This is not a stack trace. It is a live, interactive choice presented to a human, with the program paused, and the human’s answer determining how the program continues. After the choice, execution resumes from the signal site.

This is, frankly, magic. It is also old (the design dates to the early 1980s, mostly attributed to Kent Pitman) and has been stable for thirty-plus years.

Why this is more powerful than `try`/`catch`

Termination systems can model restart-style recovery only by encoding the recovery as a return value or callback parameter, threaded through every intermediate function. The condition system makes it part of the dynamic environment, like exception handling itself.

Concretely:

In a try/catch system, if you want the parser to retry on a corrected line, you must either: (a) make parse-line take a callback that returns Either<ParsedLine, NeedsRetry> and write the retry logic at the call site, or (b) catch the exception, fix it, and call parse-line again — but the call has to be at the catch site, which is at the outer level, so you’ve lost the parser’s state.
In the condition system, the parser’s restart-case declares “here are the recovery actions I support.” The caller’s handler-bind says “for this kind of condition, invoke this recovery.” The two are decoupled, the parser keeps its state, and the recovery runs inside the parser.

This is exception-safety relevant in the following way: a strong-guarantee operation in the condition-system world can sometimes avoid the strong guarantee entirely by recovering in place. The whole problem of “unwind, partial state, atomicity” only arises because we chose to unwind. The condition system gives us a way not to.

Where the condition system genuinely shines

Three classes of problem:

Validation with user interaction. The parsing example above. A long-running batch process that hits malformed input can pause, ask the operator what to do, and continue from that point, rather than aborting the batch or writing logic to checkpoint and resume.
Library code with policy choices that belong to the caller. A network library that hits a slow response can signal a slow-response condition, with restarts wait-longer, use-cached, give-up. The library doesn’t have to guess what the caller wants. The caller doesn’t have to write a parameter for every possible policy.
Recoverable resource exhaustion. Out of memory? Signal memory-low with restarts use-disk-cache, evict-oldest, give-up. Memory pressure handlers in the condition-system world can be policies registered higher up the stack, transparent to allocation sites.

Note the pattern: each of these is a place where, in mainstream languages, the choice between “fail” and “recover” happens at the call site of the failure-detection function, requiring those call sites to know what the policy is. The condition system pushes the policy out to the caller, where it belongs, without changing the API of the failure-detection function.

Where the condition system does not help

Resource cleanup on unwinding. When a handler chooses to terminate (transfer control out, unwinding the stack), the cleanup story is the same as anywhere else. Common Lisp has unwind-protect, which is the Lisp try/finally. There is no automatic destructor mechanism — Lisp objects are GC’d, like Java. So RAII-style scoped cleanup is not the strength here; the condition system addresses the control-flow part of exception safety, not the resource-cleanup part.
Atomicity across multiple mutations. Recovering in place doesn’t help if you’ve already mutated half the world. The use-value restart in the example works because parsing a line is mostly pure; if it weren’t, you’d still need the strong-guarantee patterns from chapter 4.
Performance. Restart machinery has runtime cost. SBCL and CCL pay for it on every signal, not on every potential signal site, but the cost is non-zero.

The condition system is a complement to, not a replacement for, the disciplines we’ve been discussing.

Why almost nobody uses it

If the condition system is so powerful, why is it not in every modern language?

The honest answers are partly technical and partly social:

Most languages chose termination handling first, and the design is not retrofittable. Once your runtime unwinds the stack on throw, you cannot offer in-place recovery without redesigning the runtime. Some languages (Smalltalk, Dylan) have condition-system-like mechanisms; they came from the same lineage.
The condition system is genuinely more complex to learn than try/catch. The trade-off — separating signaling, handling, and restart — adds a vocabulary and a discipline that most programmers will not learn unless they are forced to. Termination handling, for all its limitations, fits in your head in five minutes.
The Lisp ecosystem is small. The condition system is the most expressive feature of a language whose user base never reached the critical mass that would make adopting its ideas obviously profitable. The mainstream languages that have borrowed from Lisp (lambdas, closures, garbage collection, REPLs) borrowed things easier to retrofit.
Restartable signals make compiler optimization harder. A function that may signal — and continue — is harder to inline, to reason about for parallelism, to hoist invariant computations out of. C++ chose, deliberately, to make the no-throw path zero-cost; the condition system implies a small but pervasive cost on the signaling path.
The interactive debugger is a hard sell to teams that ship to production. Common Lisp’s “drop the user into the debugger and offer restarts” is a development-time superpower and a production-time horror. Production code typically binds the top-level handler to log-and-exit, which throws away most of the value. The condition system shines most in the development loop, which is where it began.

None of these are good arguments for the design choice the rest of the industry made. They are explanations.

A worked example: a strong-guarantee batch operation in Common Lisp

Here is the condition-system equivalent of the Inventory::move_item from chapter 4:

(define-condition move-failed (error)
  ((item :initarg :item :reader failed-item)
   (reason :initarg :reason :reader failed-reason)))

(defun move-item (item from to)
  (let ((removed-from-from nil)
        (added-to-to nil))
    (unwind-protect
         (handler-case
             (progn
               (bin-remove from item)
               (setf removed-from-from t)
               (bin-add to item)
               (setf added-to-to t))
           (error (c)
             ;; rollback
             (when (and removed-from-from (not added-to-to))
               (bin-add from item))
             (error 'move-failed
                    :item item :reason c)))
      ;; cleanup that runs whether we succeeded or threw
      nil)))

Two things to notice:

The structure is the same as the C++ scope-guard version. The condition system does not magically eliminate the need for two-phase commit or rollback; the resource-cleanup and invariant-preservation problem is independent of the control-flow problem.
The condition system additionally allows us to expose recovery options to callers, which try/catch cannot:

(defun move-item (item from to)
  (restart-case
      (handler-case
          (progn
            (bin-remove from item)
            (handler-case (bin-add to item)
              (bin-full ()
                ;; rollback first
                (bin-add from item)
                (error 'move-failed :item item :reason :destination-full))))
        (bin-full (c)
          (declare (ignore c))
          (error 'move-failed :item item :reason :destination-full)))
    (force-into-overflow ()
      :report "Put the item in the overflow bin and continue."
      (bin-add (overflow-bin) item))
    (return-to-source ()
      :report "Put the item back in the source bin."
      ;; we know we removed it; put it back
      (bin-add from item))))

A caller bound to force-into-overflow for bin-full conditions can move items in bulk without aborting the batch on the first full destination, and without changing move-item’s signature. The recovery policy lives at the call site of the bulk operation, where the policy belongs; the per-item function exposes the choice and lets the caller bind it.

The same pattern can be approximated in mainstream languages by passing a callback. The difference is that the condition-system version makes the callback dynamic-scoped and named, which means callers many frames up the stack can bind it once and have it picked up by every nested call to move-item, without intermediate functions having to forward it.

This is what dynamic scope is good for, and most languages do not have it because of well-known issues with dynamic scope (action at a distance, hard to type-check). The condition system uses dynamic scope exactly for the case where it is most useful and least dangerous: error handling, where the “signal sender doesn’t know the receiver” pattern is structural.

What I want you to take from this chapter

I do not expect you to switch to Common Lisp. I want you to know that:

Termination is a choice, not a fact about exception handling. Choosing termination buys simplicity and loses expressiveness. Most languages made this choice silently.
Recovery in place is possible, and where it’s possible, it sidesteps the strong-guarantee problem entirely — there’s no atomicity question if the operation never partially completed.
Dynamic-scoped policies are a powerful idea that has been mostly forgotten outside Lisp. Whenever you find yourself threading a “what to do on failure” callback through twelve layers of API, you are reinventing condition handlers, badly.
The interactive-debugger-with-restarts workflow changes what “encountering a bug” means in development. If you have never had a bug pause your program, drop you into a REPL with full access to the live state, let you inspect and fix the problem, and then resume from where the bug occurred, you do not know how dehumanizing it is to write code without that capability. Try it once. The Common Lisp implementations SBCL and CCL are free and easy to install; Practical Common Lisp by Peter Seibel is online and free; an afternoon will show you what the rest of us have lost by not having this.

The next chapter goes back to the mainstream world, where the condition system does not exist, and looks at what happens to exception safety when you add concurrency on top of it.

Exceptionally Unsafe