Let Erlang Crash

A fun, irreverent guide to the world's most indestructible programming language

View on GitHub

Chapter 25: Real-World Erlang: WhatsApp, RabbitMQ, and Friends

Theory is nice. Academic papers are lovely. But the real test of a programming language is: does anyone actually build serious stuff with it? Erlang doesn’t just pass this test — it crushes it. Some of the most demanding, highest-scale, most reliability-critical systems on the planet run on the BEAM. Let’s talk about them.


WhatsApp: The Poster Child

When Facebook acquired WhatsApp in 2014 for $19 billion, it was the largest acquisition of a venture-backed company in history. At that point, WhatsApp handled 450 million monthly users with a team of about 50 engineers.

The backend was Erlang.

The numbers:

Why Erlang worked: WhatsApp needed massive concurrency (millions of persistent connections), soft real-time message delivery, and extreme reliability. Each connection was an Erlang process. Message routing was message passing. Crash recovery was supervisors. The entire architecture mapped naturally onto BEAM primitives.

The efficiency was remarkable — WhatsApp achieved a user-to-engineer ratio that was orders of magnitude better than comparable services, largely because Erlang handled concurrency and fault tolerance at the platform level.

RabbitMQ: The Message Broker

RabbitMQ is one of the most widely deployed message brokers in the world. It implements the AMQP protocol and is used by thousands of companies for everything from microservice communication to IoT data pipelines.

Written in Erlang.

Why Erlang was the right choice:

RabbitMQ’s architecture is essentially Erlang’s concurrency model expressed as a message broker. Queues are processes. Exchanges route messages to queues. Connections are processes that talk to queue processes. It’s BEAM processes all the way down.

Ericsson: Where It All Began

Ericsson developed Erlang and still uses it extensively. Their AXD 301 ATM switch was one of the earliest major Erlang systems, achieving “nine nines” reliability — 99.9999999% uptime, or about 31 milliseconds of downtime per year.

Ericsson handles a significant portion of the world’s mobile telecom traffic on Erlang-based systems. These systems have been running — and being upgraded via hot code reloading — for decades.

Discord: Elixir on the BEAM

Discord uses Elixir (which runs on the BEAM) for their real-time messaging infrastructure. At scale, they handle millions of concurrent users across thousands of guild servers.

Their engineering blog has documented several interesting BEAM-specific solutions:

Discord demonstrates that the BEAM’s strengths extend beyond Erlang itself — the virtual machine is the platform, and both Erlang and Elixir benefit from it.

Klarna: Fintech at Scale

Klarna, one of Europe’s largest fintech companies, has used Erlang since its founding. Their payment processing system handles enormous transaction volumes with the reliability guarantees that financial services demand.

In fintech, a crash can mean lost money. Klarna’s use of Erlang’s fault-tolerance features — supervisors, process isolation, hot code upgrades — means that individual failures don’t propagate into financial errors.

CouchDB and CouchBase

CouchDB, the distributed document database, was originally written in Erlang. Its replication protocol, conflict resolution, and clustering all leveraged Erlang’s distributed computing primitives.

The design philosophy matched perfectly: CouchDB was built around eventual consistency, where nodes operate independently and sync when possible — exactly the kind of distributed system Erlang excels at.

Bet365: Real-Time Betting

Bet365, one of the world’s largest online gambling companies, uses Erlang extensively for their real-time betting platform. They need:

Sound familiar? These are exactly the requirements Erlang was designed for.

Cisco: Network Infrastructure

Cisco uses Erlang in their network management and configuration tools, including parts of their NSO (Network Services Orchestrator) platform. Network infrastructure has the same uptime requirements as telecom switches — the network can’t go down.

VerneMQ and EMQ X: MQTT Brokers

The IoT world runs on MQTT, and two of the most popular MQTT brokers are built on the BEAM:

IoT means millions of devices maintaining persistent connections, sending small messages frequently. Sound like a telecom switch? Same problem, different era.

What These Systems Have in Common

Every system above shares these characteristics:

  1. Massive concurrency — Millions of connections or concurrent operations
  2. Persistent connections — Long-lived TCP connections, not just request/response
  3. Soft real-time — Responses needed in milliseconds, not seconds
  4. High availability — Downtime is unacceptable or extremely costly
  5. Message-oriented — The core operation is routing messages between entities

These are the problems Erlang was born to solve. Not because it’s the fastest language, or the most expressive, or the most popular. But because its concurrency model, fault tolerance, and distribution primitives map directly onto these requirements.

The Pattern

If you look at successful Erlang/BEAM deployments, there’s a clear pattern:

Problem domain              → BEAM solution
─────────────────────────────────────────────
Per-connection state        → One process per connection
Message routing             → Message passing between processes
Crash isolation             → Process isolation + supervisors
Horizontal scaling          → Distributed Erlang
Zero-downtime upgrades      → Hot code reloading
Persistent connections      → Lightweight processes (millions)
Real-time responsiveness    → Preemptive scheduling, per-process GC

Each row is a hard problem in most languages. In Erlang, each one is a built-in feature.

Why Not Erlang?

Being fair, Erlang is not the best choice for everything:

Erlang is a specialist tool for distributed, concurrent, fault-tolerant systems. In that domain, nothing else comes close.

Key Takeaways

The next time someone asks “who uses Erlang?”, you can tell them: the systems they use every day. They just don’t know it.


← Previous: Testing Next: Elixir →