HTTP Alternatives for APIs

Most API code is written without a protocol decision. Engineers reach for REST over HTTP because it is what they know, what the framework assumes, and what everyone else uses. That is a reasonable default — but it is still a choice, and like any invisible choice, it has costs.

This book surveys the alternatives: what they are, how they work, where they outperform REST, and where they do not. It is organized as a practical tour rather than a reference manual. Each chapter covers one protocol or family of protocols in enough depth to understand the tradeoffs, with the goal of making protocol selection a deliberate decision rather than an accidental one.

What This Book Covers

Chapter 1 — The Default We Never Questioned Why REST became the baseline, what it costs in performance and expressiveness, and when those costs matter enough to look elsewhere.

Chapter 2 — gRPC Protocol Buffers and HTTP/2 under the hood: strongly-typed contracts, efficient binary serialization, and streaming primitives that REST cannot express.

Chapter 3 — WebSockets and SSE Full-duplex connections and server-sent events: when you need the server to push data rather than wait to be asked.

Chapter 4 — MQTT and the IoT World Publish-subscribe over constrained networks: designed for unreliable connections, minimal overhead, and millions of devices.

Chapter 5 — AMQP and Message Brokers Enterprise messaging with guaranteed delivery, routing, and backpressure — the RabbitMQ ecosystem and when you need a broker in the middle.

Chapter 6 — ZeroMQ Messaging without a broker: low-latency socket patterns for high-throughput systems that cannot afford the overhead of a central queue.

Chapter 7 — Cap’n Proto RPC and the Binary Frontier Zero-copy serialization and capability-based RPC at the extreme end of the performance spectrum.

Chapter 8 — Lesser-Known Contenders Thrift, Avro, Flatbuffers, Nano, NATS, and others that solve real problems in specific contexts.

Chapter 9 — What Nobody Has Tried Yet Emerging directions: QUIC, WebTransport, local-first architectures, and patterns at the edge of current practice.

Chapter 10 — How to Choose A decision framework for matching protocol to use case, with worked examples across common API scenarios.

The Default We Never Questioned

There is a particular kind of decision that engineers make without making it. Nobody sits down and evaluates REST against the alternatives, weighs the tradeoffs, and concludes that REST is the right choice for their use case. They just write a controller, annotate it with @GetMapping or decorate it with app.get(...), and move on. REST has won by becoming invisible — not by being optimal, but by being the assumed baseline from which all other choices must justify themselves.

This book is about questioning that assumption. Not because REST is bad — it is often perfectly adequate — but because “adequate” and “optimal” are different things, and the gap between them has real costs. Once you know what else exists and when it matters, you can make an actual decision instead of an accidental one.

How REST Got Here

REST — Representational State Transfer — was described by Roy Fielding in his 2000 dissertation as an architectural style for distributed hypermedia systems. The core insight was elegant: treat everything as a resource, use uniform interfaces to manipulate it, keep interactions stateless, and let the web’s existing infrastructure (caches, proxies, load balancers) do useful work.

It was a description of the web as it already worked, not a prescription for building APIs. APIs came later, and they borrowed REST’s vocabulary without always inheriting its constraints. What most people call “REST APIs” today are better described as HTTP APIs with JSON bodies — they use HTTP verbs and status codes, but they frequently violate Fielding’s constraints, omit HATEOAS entirely, and treat URLs as a naming convention rather than a hypermedia control system. That is not necessarily wrong; it is just not what Fielding described.

The practical story is simpler: HTTP was everywhere. Every language had HTTP client libraries. Every firewall understood HTTP. Every operations team knew how to proxy and load-balance it. JSON emerged as a readable, flexible data format that worked in browsers without a parsing step. The combination was frictionless in a way that nothing else was, and frictionless wins — especially when you are moving fast and the infrastructure costs of something more specialized are real.

The Costs Nobody Charges You For

REST’s ubiquity has a way of making its costs invisible. They are real, but they are baked into the baseline, so you pay them without noticing.

Request-response is the only primitive. HTTP is a synchronous request-response protocol. The client asks; the server answers. This maps naturally to fetching data, but it maps poorly to many other communication patterns: server-initiated events, streaming data, fire-and-forget notifications, bidirectional negotiation. Every time you need something other than ask-and-answer, you are working around the protocol: polling loops, long-polling hacks, webhook callbacks, SSE bolted on as an afterthought.

Connections are expensive and reused poorly. HTTP/1.1 pipelining was supposed to help with connection overhead, but browser and proxy implementations were so buggy that it was effectively abandoned. The practical result was that browsers opened six connections per host and serialized requests across them, burning three-way handshake overhead on every request to a cold connection. HTTP/2 multiplexing helped significantly, but most internal services — running over TCP between components that control both ends — are still paying connection overhead that custom protocols would not incur.

Text is readable, but not efficient. JSON’s human-readability is genuinely valuable during development. It is also genuinely wasteful at scale. Parsing JSON is not free: a service processing a million requests per second is spending a measurable fraction of its CPU budget deserializing JSON that it will immediately re-serialize when forwarding to the next tier. More subtly, JSON has no schema at the wire level, which means the recipient must validate structure at runtime and the compiler cannot help you when you add a field on one side without updating the other.

URLs are a weak contract. The URL space is not typed, not versioned, and not machine-verifiable. You can document it with OpenAPI and generate clients from that documentation, but the schema is one artifact among many rather than a compiler-enforced contract. The gap between the OpenAPI spec and the actual server behavior is bridged by tests, vigilance, and prayer. This is manageable, but it is friction that other protocols eliminate.

HTTP semantics are frequently abused. The HTTP specification defines clear semantics for GET (safe, idempotent), POST (non-idempotent create or action), PUT (idempotent replace), PATCH (partial update), DELETE (remove). In practice, many APIs use POST for everything non-trivial, return 200 for errors, and embed status information in JSON response bodies because actually reading HTTP status codes from JavaScript requires effort. The protocol’s semantic richness is largely unused.

The Myth of Universal Tooling

The usual argument for REST is tooling: curl, Postman, browsers, logging proxies, load balancers — everything understands HTTP. This is true, and it matters. But the argument is weakening.

gRPC has server reflection and grpcurl. MQTT has tools like MQTT Explorer and mosquitto_pub. Kafka has an entire ecosystem. The days when choosing a non-HTTP protocol meant building your own tooling from scratch are largely over. The tooling argument is still a real consideration — HTTP’s tooling ecosystem is larger and more mature than any alternative — but it should be weighed honestly against other factors rather than treated as a trump card.

The argument also inverts strangely when applied internally. For service-to-service communication inside a microservice architecture, the “every firewall understands HTTP” argument is irrelevant — there is no firewall between your services. The “debugging with curl” argument matters less when your real debugging workflow involves distributed traces and structured logs. Internal services have different constraints than public APIs, and they are often better served by protocols optimized for internal use.

When REST Actually Is Right

This book is not an argument against REST. It is an argument against REST by default. The distinction matters.

REST over HTTP/JSON is a strong choice when:

You are building a public API. Developers consuming your API should not need special client libraries. JSON over HTTP is universal, readable, and explorable. The documentation burden of a structured binary protocol is not worth it for a public surface.
Your clients are browsers. Browsers speak HTTP natively. WebSockets and SSE extend HTTP. Everything else requires a gateway or a workaround.
Caching matters. HTTP has a sophisticated, well-understood, widely-implemented caching model. GET requests are cacheable by CDNs, reverse proxies, and browsers without any additional work. Few other protocols have anything comparable.
Your team is small and moving fast. The operational overhead of running MQTT brokers, learning Protocol Buffers, or debugging ZeroMQ socket state is real. For a small team with limited operational capacity, REST’s simplicity is not just convenience — it is a strategic choice that keeps the system maintainable.
Your traffic patterns fit request-response. If users are clicking buttons and waiting for results, and your data model is resources-on-a-server, REST is doing exactly what it was designed to do. Do not optimize for problems you do not have.

REST’s costs appear when these conditions do not hold: when you are building for constrained devices, when you need true bidirectionality, when you are processing millions of events per second, when you need guaranteed delivery semantics, when your system is a distributed pipeline rather than a collection of resources. These are the cases where the alternatives covered in this book offer something that REST cannot.

Reading This Book

The chapters that follow cover each major alternative in depth: what it actually is (not just what the marketing says), how it works mechanically, where it fits, and where it does not. Some of these — gRPC in particular — are widely called alternatives to HTTP while actually being built on top of it. The costume is convincing enough to merit its own chapter.

The goal is not to give you a flowchart that tells you which protocol to pick. Protocol selection is a judgment call that depends on context that no flowchart can encode. The goal is to make sure that when you make the call, you are making it with full information rather than with the vague sense that REST is what everyone does.

Everyone doing something does not make it optimal. It just makes it the default.

gRPC — HTTP/2 in a Convincing Costume

Here is the thing nobody says out loud at conferences: gRPC is not an alternative to HTTP. It is HTTP/2 with a binary framing layer and a code generator bolted on top. Every gRPC call is an HTTP/2 request. The metadata is HTTP/2 headers. The status codes are mapped to HTTP/2 trailers. The multiplexing, flow control, and connection management are all HTTP/2 doing what HTTP/2 does.

This is not a criticism. gRPC is genuinely useful, and the protocol choices behind it are sound. But understanding what gRPC actually is — rather than what the positioning implies — is prerequisite to understanding when it helps and when it does not.

What gRPC Actually Is

gRPC was open-sourced by Google in 2015 and is now a CNCF project. Google had been running its own internal RPC system called Stubby for years; gRPC was a public version built on the then-emerging HTTP/2 standard rather than a proprietary transport.

The stack has three layers:

Protocol Buffers for interface definition and serialization. You define your service and message types in .proto files. The protoc compiler generates client stubs and server skeletons in your language of choice. On the wire, messages are binary-encoded using the Protocol Buffers encoding, which is compact and fast to parse compared to JSON.
HTTP/2 as the transport. gRPC uses HTTP/2’s multiplexing to run multiple concurrent RPCs over a single TCP connection. Request and response metadata travel as HTTP/2 headers. Request and response bodies travel as HTTP/2 DATA frames, with a five-byte length-prefix that gRPC adds for framing within the stream.
The gRPC protocol which is a thin layer on top of HTTP/2 specifying how to map RPC semantics — method names, status codes, error details, timeouts, deadlines — to HTTP/2 primitives.

The result is a system that is semantically much richer than JSON over HTTP/1.1, but which traffics over port 443 and looks like HTTPS to network infrastructure that does not inspect payload.

Protocol Buffers: Schema as Contract

Protocol Buffers deserves separate attention because it is doing most of the work that engineers often attribute to gRPC.

A .proto file is a machine-readable, language-independent schema:

syntax = "proto3";

package payments;

service PaymentService {
  rpc Charge(ChargeRequest) returns (ChargeResponse);
  rpc Stream(StreamRequest) returns (stream Event);
}

message ChargeRequest {
  string idempotency_key = 1;
  int64 amount_cents = 2;
  string currency = 3;
  string source_token = 4;
}

message ChargeResponse {
  string transaction_id = 1;
  ChargeStatus status = 2;
}

enum ChargeStatus {
  CHARGE_STATUS_UNSPECIFIED = 0;
  CHARGE_STATUS_SUCCESS = 1;
  CHARGE_STATUS_DECLINED = 2;
  CHARGE_STATUS_ERROR = 3;
}

From this file, protoc generates:

A client stub in every supported language (Go, Java, Python, C++, Ruby, C#, Node.js, and many more)
A server interface that you implement
Serialization and deserialization code
Type-safe accessors for all fields

The generated code is not just convenient — it is a compiler-enforced contract. If you remove a field from a message that a client still reads, the build fails on the client side. If you add a required field to a request without updating the server, the server will return an error. The schema is not documentation; it is a build artifact that both sides depend on.

This is qualitatively different from OpenAPI, which generates client code from a spec but cannot prevent the spec from drifting from the server implementation. With protobuf, the schema and the implementation are the same thing.

Schema evolution is handled through field numbers, not field names. The binary encoding references fields by their integer tags, not their string names. As long as you never reuse a field number and never change a field’s type, you can add new fields and remove old ones without breaking existing clients. Old clients ignore fields they do not know about; new clients get zero values for fields that old servers do not send. This is not magic — it requires discipline — but the mechanism is well-designed.

The Four Streaming Modes

gRPC supports four interaction patterns, which is one of its genuine advantages over REST:

Unary RPC is the standard request-response: one request, one response. This is what REST gives you, mapped to HTTP/2.

rpc Charge(ChargeRequest) returns (ChargeResponse);

Server streaming allows the server to send a sequence of responses to a single request. Useful for progress updates, large result sets, or push notifications.

rpc WatchEvents(WatchRequest) returns (stream Event);

Client streaming allows the client to send a sequence of requests before receiving a single response. Useful for bulk uploads or aggregation.

rpc BatchImport(stream ImportRecord) returns (ImportSummary);

Bidirectional streaming allows both sides to send sequences of messages independently over a single connection. The semantics are not quite arbitrary bidirectional messaging — the server stream is still initiated by a client request — but it is close enough for most purposes.

rpc Chat(stream Message) returns (stream Message);

All four modes are built on HTTP/2 streams. The unary mode uses a single HTTP/2 request-response cycle. The streaming modes use HTTP/2 stream multiplexing to keep the logical stream open while messages flow.

What HTTP/2 Brings

Since gRPC is HTTP/2, it inherits HTTP/2’s properties directly:

Multiplexing. A single TCP connection carries multiple concurrent RPCs without head-of-line blocking at the HTTP layer. With HTTP/1.1, if you had 100 concurrent requests you needed 100 connections (in practice, six, serialized). With HTTP/2, you need one.

Header compression. HTTP/2 HPACK compresses headers using a combination of static and dynamic tables. Repeated headers (Content-Type, method names, common metadata) are transmitted as indexed references rather than repeated strings. For workloads with many small requests, this meaningfully reduces overhead.

Binary framing. HTTP/2 frames are binary, not text. This is more efficient to parse and less ambiguous than HTTP/1.1’s text-based format, which has been a source of security vulnerabilities (request smuggling, response splitting) that arise from inconsistent parsing.

Flow control. HTTP/2 has per-stream and per-connection flow control windows. This prevents a fast sender from overwhelming a slow receiver without dropping the connection.

What HTTP/2 does not help with: TCP head-of-line blocking. HTTP/2 solves head-of-line blocking at the HTTP layer, but TCP is still a single ordered stream. If one packet is lost, all streams on that connection stall until the loss is recovered. HTTP/3 over QUIC solves this by using UDP with per-stream loss recovery. gRPC over QUIC exists but is not yet standard.

The Things That Hurt

The convincing costume starts to slip in a few specific places.

Load balancers are confused. Traditional L4 load balancers distribute connections. With HTTP/1.1, where each connection carries a small number of requests, connection-level load balancing is a reasonable proxy for request-level load balancing. With HTTP/2, where a single long-lived connection carries many multiplexed requests, connection-level load balancing fails completely: all traffic from a client will go to whichever backend handled the connection establishment, and other backends will be idle.

gRPC requires L7 load balancing — a proxy that understands the HTTP/2 framing and can route individual requests to different backends. This means you need Envoy, or nginx with HTTP/2 upstream support, or a service mesh, or client-side load balancing with a gRPC-aware client library. These are solvable problems, but they are real operational complexity that does not exist with HTTP/1.1.

Browsers cannot use gRPC directly. The browser’s fetch API does not expose HTTP/2 trailers, which gRPC uses for status codes. The browser’s XMLHttpRequest does not support HTTP/2 at all. The result is that you cannot call a gRPC service from browser JavaScript using the standard gRPC protocol.

gRPC-Web is the workaround. It is a modified protocol that uses HTTP/1.1 or HTTP/2, encodes trailers differently, and works through a translation proxy (typically Envoy). It works, but it requires the proxy, it does not support client streaming, and it adds complexity. If your primary client is a browser, gRPC-Web’s operational overhead may not be worth it.

Debugging requires tools. curl understands HTTP. It does not speak Protocol Buffers. A gRPC request captured by Wireshark or a logging proxy looks like opaque binary data. You need grpcurl or a gRPC-aware debugging tool to inspect traffic. This is manageable, but it changes the character of debugging and makes ad-hoc exploration harder.

Error handling is often re-invented. gRPC has a small set of status codes (OK, CANCELLED, UNKNOWN, INVALID_ARGUMENT, and about a dozen others). These cover the common cases but are coarser than HTTP’s rich status code vocabulary. In practice, many gRPC services embed more detailed error information in response messages or in google.rpc.Status details, which means you end up with two error-handling systems anyway.

Proto evolution requires discipline. The schema evolution story is good, but it requires that everyone follow the rules: never reuse field numbers, never change field types, always use zero-valued defaults for missing fields. On a small team or a codebase with strong conventions, this is fine. On a large team with many contributors, field number management becomes a coordination problem, and the backward-compatibility guarantee is only as strong as the discipline enforcing it.

When gRPC Is the Right Choice

Despite the above, gRPC is genuinely excellent in specific contexts.

Microservice-to-microservice communication inside a controlled infrastructure is gRPC’s strongest use case. You control both ends. Browsers are not involved. The schema contract eliminates the API versioning headaches that plague JSON-based services. Code generation removes the class of bugs where the client and server have drifted. The operational complexity of L7 load balancing is worth it for a system with hundreds of internal services.

Polyglot environments benefit enormously from the generated client libraries. If your system has services written in Go, Java, Python, and Rust, the alternative to generated clients is maintaining hand-written clients in all four languages, keeping them in sync as the API evolves. protoc does this for you.

High-throughput services where serialization overhead matters. Protocol Buffers encoding is meaningfully faster to serialize and deserialize than JSON, and smaller on the wire. For services processing millions of requests per second, this matters. For services handling a few hundred requests per second, it probably does not.

Streaming use cases where you need server push, client streaming, or true bidirectional streaming without the complexity of WebSockets. gRPC streaming is built into the protocol and integrated with the type system; it is not an afterthought.

The Honest Summary

gRPC is HTTP/2 with a schema language, code generation, and a binary serialization format. These are genuinely valuable additions. The schema language gives you compiler-enforced contracts. The code generation eliminates a class of serialization bugs. The binary format is faster and smaller than JSON.

The transport is still HTTP/2, which means you get HTTP/2’s benefits (multiplexing, header compression, binary framing) and HTTP/2’s operational requirements (L7 load balancing, TLS). The performance improvements over HTTP/1.1 + JSON are real but often overstated; the operational complexity is real and often understated.

If you are building internal services in a microservice architecture, gRPC is probably the right default. If you are building a public API that browsers consume, you should think carefully before choosing gRPC-Web over a well-designed HTTP/JSON API. If you need transport that is not HTTP at all — for constrained devices, for message queues, for pub/sub semantics — gRPC does not help you and the subsequent chapters will.

WebSockets and SSE — When You Need the Server to Talk Back

The request-response model has a fundamental asymmetry: the client always initiates. The server can respond richly, but it cannot reach out first. For most of human history with the web, this was fine — users clicked things, servers answered. But modern applications broke this assumption. Chat applications need instant message delivery. Dashboards need live data. Multiplayer games need continuous state synchronization. Collaborative editors need sub-second propagation of every keystroke.

Two standards emerged from this need: WebSockets and Server-Sent Events. They solve the same surface-level problem — getting data from server to client without polling — but they solve it differently and are suited for different applications. Choosing between them is not obvious, and the “just use WebSockets for everything” instinct that many engineers carry is often wrong.

The Polling Era and Why It Was Terrible

Before persistent connections, the two approaches were polling and long-polling.

Polling is the naive solution: the client sends a request every N seconds asking “anything new?” This is simple, stateless, and compatible with every HTTP infrastructure ever built. It is also wasteful in both directions — the server does work answering “no, nothing” requests, and the client experiences latency up to N seconds between an event occurring and it being delivered.

Long-polling is the clever hack: the client sends a request, and the server holds the connection open until it has something to say (or until a timeout forces a response). The client immediately re-connects after receiving a response. This delivers events with much lower latency than regular polling, but it still opens a new connection for each event cycle, burning TCP handshake overhead, and it creates subtle problems with connection limits, proxy timeouts, and load balancer behavior.

Both of these work. Companies ran real-time applications on long-polling for years. But they are workarounds for a protocol constraint, and the constraints are visible in the engineering required to make them reliable.

WebSockets

WebSockets, standardized as RFC 6455 in 2011, replace the HTTP request-response model with a persistent, full-duplex, bidirectional channel. Either side can send a message to the other at any time, without the overhead of initiating a new request.

The Handshake

A WebSocket connection starts as an HTTP/1.1 request with an Upgrade header:

GET /chat HTTP/1.1
Host: example.com
Upgrade: websocket
Connection: Upgrade
Sec-WebSocket-Key: dGhlIHNhbXBsZSBub25jZQ==
Sec-WebSocket-Version: 13

The server accepts the upgrade:

HTTP/1.1 101 Switching Protocols
Upgrade: websocket
Connection: Upgrade
Sec-WebSocket-Accept: s3pPLMBiTxaQ9kYGzzhZRbK+xOo=

After the 101 response, the HTTP protocol is abandoned and the connection becomes a WebSocket connection. The TCP socket remains open, but the framing, multiplexing, and message boundary semantics are all defined by the WebSocket protocol rather than HTTP.

The Sec-WebSocket-Key / Sec-WebSocket-Accept exchange is not security — it is a sanity check to prevent the connection from being accidentally established by infrastructure that does not understand WebSockets. The server concatenates the client’s key with a fixed GUID, hashes the result with SHA-1, and base64-encodes it. This proves that the server read the header, which is enough to prevent most accidental upgrades.

The Frame Format

WebSocket messages are sent as frames. Each frame has a header of 2-14 bytes followed by the payload:

 0                   1                   2                   3
 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-------+-+-------------+-------------------------------+
|F|R|R|R| opcode|M| Payload len |    Extended payload length    |
|I|S|S|S|  (4)  |A|     (7)     |             (16/64)           |
|N|V|V|V|       |S|             |   (if payload len==126/127)   |
| |1|2|3|       |K|             |                               |
+-+-+-+-+-------+-+-------------+ - - - - - - - - - - - - - - - +

Key fields:

FIN indicates whether this is the last frame in a message (messages can be fragmented)
Opcode identifies the frame type: 0x1 (text), 0x2 (binary), 0x8 (close), 0x9 (ping), 0xA (pong)
MASK indicates whether the payload is masked. Client-to-server frames must be masked; server-to-client frames must not be
Payload len encodes the payload length with a compact three-case encoding: 0-125 directly, 126 means the next 2 bytes hold the length, 127 means the next 8 bytes hold the length

The masking requirement for client-to-server frames is a security measure against proxy cache poisoning: a malicious client could craft WebSocket messages that, when interpreted as HTTP responses, poison an intermediary’s cache. Random masking prevents this attack.

What WebSockets Give You

Full duplex means both sides can send and receive simultaneously on the same connection. A chat server can push new messages to a client while the client is in the middle of composing a message, without any coordination required.

The message abstraction is useful. Unlike raw TCP, WebSockets have a notion of a complete message (which may span multiple frames). Your application code receives complete messages, not partial payloads.

No request overhead on each message. After the initial handshake, sending a message is just TCP write + WebSocket frame header. No HTTP headers, no method, no URL, no status code.

The protocol is also application-agnostic. You can carry any payload: JSON, Protocol Buffers, MessagePack, raw binary, whatever your application uses. WebSockets do not care about the content; they just deliver frames.

What WebSockets Do Not Give You

WebSockets do not give you a message format, a serialization protocol, or any application-level semantics. You get a bidirectional byte pipe with message boundaries. What you do with it is entirely up to you.

This means every WebSocket application invents its own sub-protocol for things like:

How does the server indicate which type of message this is?
How does the client subscribe to specific event streams?
How does request-response get modeled over a bidirectional channel?
What happens when the client reconnects after a dropped connection?
How are undelivered messages replayed?

None of this is specified. Some applications use JSON with a type field and a discriminated union pattern. Some use libraries like Socket.IO that layer an application protocol on top of WebSockets (and fall back to long-polling if WebSockets are unavailable). Some roll their own binary framing. The result is that WebSocket interoperability requires both sides to agree on an application protocol, which is typically documented nowhere formally.

Scaling WebSockets

This is where most WebSocket-naive implementations encounter their first serious surprise.

HTTP is stateless. Any request can be handled by any server behind a load balancer. WebSockets are stateful: the connection is persistent and the server holds per-connection state. If a client is connected to server A, all messages to that client must be delivered through server A — or server A must be able to route messages to the client, which requires a pub/sub layer across servers.

The standard approach is a Redis pub/sub layer (or equivalent). When server A needs to deliver a message to a client connected to server B, it publishes to a channel; server B is subscribed to that channel and delivers the message over its connection. This works, but it adds operational complexity and a hop.

Load balancers must use sticky sessions (IP hash or cookie-based affinity) or understand WebSockets well enough to route by connection. AWS ALB, nginx, and HAProxy all handle WebSocket connections correctly with appropriate configuration; the issue is usually that default configurations do not.

Connection counts matter differently than with HTTP. A REST server handling 10,000 requests per second can do so over a pool of connections that is much smaller than 10,000, because connections are short-lived. A WebSocket server may have 100,000 simultaneously open connections, each representing a client that has not disconnected. Operating system limits on file descriptors, memory per connection, and TCP state become relevant at scale.

Server-Sent Events

Server-Sent Events (SSE) is the less famous sibling: a standard for server-to-client streaming over plain HTTP, defined in the HTML specification. Where WebSockets are full-duplex and transport-layer, SSE is unidirectional and application-layer.

The SSE Model

An SSE connection starts as a standard HTTP GET request. The server responds with Content-Type: text/event-stream and keeps the connection open, sending text-formatted events as they occur:

HTTP/1.1 200 OK
Content-Type: text/event-stream
Cache-Control: no-cache
Connection: keep-alive

data: {"type": "price_update", "symbol": "AAPL", "price": 182.47}

data: {"type": "price_update", "symbol": "AAPL", "price": 182.51}

id: 42
event: alert
data: {"message": "Circuit breaker triggered", "symbol": "GOOG"}

The event format is simple:

data: lines carry the payload. Multiple data: lines for one event are concatenated with newlines.
id: sets the event ID, which the client uses for reconnection.
event: sets a named event type (optional; default is message).
retry: suggests a reconnection delay in milliseconds.
A blank line terminates an event.

That is the entire protocol. It is so simple you could implement it in a few lines of any language.

Auto-Reconnection

SSE has built-in reconnection semantics that WebSockets do not. The browser’s EventSource API automatically reconnects when the connection drops, sending the Last-Event-ID header with the ID of the last received event. The server can use this to replay missed events from where the client left off.

This is enormously useful for reliability. A WebSocket application must implement its own reconnection logic, including deciding how to detect that the connection is dead (ping/pong, application-level heartbeats), how to track the last received state, and how to replay messages after reconnection. SSE gives you this for free.

The Browser EventSource API

const source = new EventSource('/api/events');

source.onmessage = (event) => {
  const data = JSON.parse(event.data);
  console.log('Received:', data);
};

source.addEventListener('alert', (event) => {
  showAlert(JSON.parse(event.data));
});

source.onerror = (error) => {
  // EventSource will automatically reconnect; this handler fires on errors
  console.error('SSE error:', error);
};

The EventSource API does not expose the reconnection logic; it just handles it. The application code sees a stream of events.

SSE’s Actual Limitations

HTTP/1.1 connection limit. Browsers limit concurrent connections to the same origin to six (HTTP/1.1) or unlimited with connection coalescing (HTTP/2). SSE opens a long-lived connection, which uses one of those six slots. If a page opens multiple SSE connections, it can exhaust the limit. HTTP/2 alleviates this significantly since all streams share a connection.

No binary support in the browser API. The EventSource API only handles text. If you want to send binary data, you must base64-encode it or use another mechanism. This is a browser-API limitation, not a protocol limitation — the SSE protocol is text-based anyway.

Unidirectional. SSE carries data from server to client only. The client sends requests through normal HTTP. For most push-notification use cases, this is fine; for cases where the client needs to send a stream of data to the server, SSE cannot help.

Some proxies buffer the response. Nginx’s default buffering will collect the SSE response body before forwarding it to the client, which breaks the streaming semantics entirely. You must configure proxy_buffering off for SSE endpoints. Similar issues exist with other proxy software.

Choosing Between WebSockets and SSE

The choice is often clearer than it appears.

Use SSE when:

You primarily need server-to-client push (notifications, live feeds, dashboards)
You want built-in reconnection with state recovery
You want to use standard HTTP infrastructure (CDNs, proxies, load balancers without sticky sessions)
Your clients are browsers and you value simplicity
You do not need to send a stream of data from client to server

Use WebSockets when:

You need true bidirectional messaging (chat, collaborative editing, multiplayer games)
You have high message volume in both directions
You need binary streaming (audio, video, game state)
You are not limited to browsers and want more control over the framing

The common failure mode is choosing WebSockets when SSE would have been sufficient. WebSockets are more complex to scale, require sticky sessions or a pub/sub layer, have no built-in reconnection, and require you to design your own application protocol. If you are building a live dashboard that pushes server state to passive viewers, SSE is half the operational complexity and twice the reliability for exactly that use case.

WebSockets win when the server and client are participants in a symmetric conversation — both sides initiating messages, both sides listening. SSE wins when the relationship is fundamentally asymmetric: the server has information; the client wants to receive it.

A Note on Socket.IO

Socket.IO is frequently described as a WebSocket library but is better understood as an abstraction layer over multiple transports, with WebSockets being the preferred transport and long-polling as a fallback. It adds rooms, namespaces, automatic reconnection, acknowledgments, and broadcasting to the raw WebSocket model.

Socket.IO is not the WebSocket protocol. A standard WebSocket client cannot connect to a Socket.IO server; a Socket.IO client cannot connect to a standard WebSocket server. This matters if you are designing an API that non-browser clients will consume — you are committing to Socket.IO’s protocol on both sides.

The library is a reasonable choice for browser-heavy applications where the application semantics it provides (rooms, broadcasts, namespaced channels) match your needs. It is a poor choice if you want protocol interoperability or if you are running in an environment where the fallback to long-polling is not needed (most modern mobile clients support WebSockets natively).

Operational Reality

Both WebSockets and SSE are harder to operate than REST endpoints in certain specific ways.

Connection state means your autoscaling story changes. With stateless HTTP, adding or removing servers is transparent. With persistent connections, removing a server means dropping all its connections. Clients will reconnect, but they will experience an interruption. Blue/green deployments require graceful connection draining rather than hard cutover.

Both protocols interact oddly with TLS termination at load balancers. WebSockets require the load balancer to forward the upgraded connection, not just proxy HTTP/1.1. SSE requires the response not to be buffered. These are configured per-server, and the defaults are usually wrong.

Monitoring connection counts, message rates, and connection lifetimes requires tooling that most generic HTTP monitoring systems do not provide. You will need to instrument your application code.

None of these problems are unsolvable. They are just problems that do not exist with stateless REST, and you should plan for them before they surprise you in production at 2 AM.

MQTT and the IoT World

In 1999, Andy Stanford-Clark at IBM and Arlen Nipper at Arcom designed a protocol to monitor oil pipelines over satellite links — connections that were expensive per byte, unreliable, and intermittent. The protocol they designed used publish/subscribe messaging, fit control packets into a minimum of two bytes, and could maintain a session across disconnections. They called it MQTT (originally Message Queue Telemetry Transport, though the authors later dropped the expanded name).

Twenty-five years later, MQTT is the dominant protocol for IoT devices. It runs on microcontrollers with 32KB of RAM, over cellular connections that lose signal when the device drives through a tunnel, in deployments that range from single-device hobby projects to millions of sensors reporting in simultaneously. If you are building anything with constrained devices, unreliable networks, or fire-and-forget telemetry, you should understand MQTT before you reach for HTTP.

The Architecture

MQTT is a publish/subscribe protocol built on top of TCP. The architecture has three components:

Clients — publishers, subscribers, or both. A client can be a sensor publishing temperature readings, a mobile app subscribing to notifications, or a backend service that both subscribes to events and publishes commands.

Broker — the central message hub. Clients do not communicate directly; they connect to the broker, which routes messages between them. Popular brokers include Eclipse Mosquitto (lightweight, open source, runs on a Raspberry Pi), EMQ X (high-throughput, clustering support, HTTP and WebSocket gateways), HiveMQ (enterprise-oriented, cluster-native), and cloud-hosted brokers like AWS IoT Core and Azure IoT Hub.

Topics — hierarchical string identifiers that organize the message space. A topic like sensors/building-a/floor-3/temperature describes exactly what it sounds like. Topics are not pre-registered; they spring into existence when a client publishes to them.

The publish/subscribe model decouples publishers from subscribers in time and space. A temperature sensor does not know or care how many services are subscribed to its readings. A dashboard service does not know or care how many sensors are contributing to the feed. This decoupling is architecturally valuable and is one of the things HTTP request-response makes difficult.

The Packet Format

MQTT’s packet format reflects its constrained-network origins. Every packet starts with a fixed header of at least two bytes:

Byte 1: [Message Type (4 bits)] [Flags (4 bits)]
Byte 2+: Remaining Length (variable-length encoding, 1-4 bytes)

The remaining length uses a variable-length encoding borrowed from Protocol Buffers: each byte contributes 7 bits of the length value, and the high bit indicates whether another byte follows. This allows lengths up to 268 MB to be encoded in at most 4 bytes, while short messages (the common case) use a single byte for the length.

Compare this to HTTP: a minimal HTTP/1.1 GET request is typically 250+ bytes just for the method, path, host, and required headers. An MQTT PUBLISH packet for a small payload is around 7-10 bytes of overhead. For a device sending a temperature reading every second over a cellular connection where data costs money per kilobyte, this difference accumulates.

Quality of Service Levels

MQTT’s three QoS levels are one of its most important features, and one of the most frequently misunderstood.

QoS 0 — At most once (fire and forget). The broker delivers the message at most once, with no acknowledgment. If the connection drops during delivery, the message is lost. Overhead: one round trip (PUBLISH). Appropriate for high-frequency sensor data where individual readings are not critical — if you lose a temperature reading, the next one arrives in a second.

QoS 1 — At least once. The sender stores the message until it receives a PUBACK acknowledgment. If the connection drops before the acknowledgment arrives, the sender retransmits the message (with the DUP flag set). This guarantees delivery but may deliver duplicates. Overhead: two round trips (PUBLISH → PUBACK). The receiver must handle duplicates by making its processing idempotent or by tracking message IDs.

QoS 2 — Exactly once. A four-way handshake ensures the message is delivered exactly once. PUBLISH → PUBREC → PUBREL → PUBCOMP. The sender holds the message until PUBCOMP; the broker holds a message ID reservation until PUBREL. Overhead: four round trips. Appropriate for cases where duplication causes problems (commands, financial transactions) but high latency is acceptable.

The choice of QoS level is not just about reliability — it is about the tradeoff between reliability, latency, and overhead. A sensor that sends 100 readings per second probably wants QoS 0. A control command that turns off a piece of industrial equipment probably wants QoS 2.

One important subtlety: QoS levels apply hop-by-hop, not end-to-end. If a publisher sends at QoS 2, the delivery guarantee is between the publisher and the broker. If a subscriber subscribes at QoS 1, the delivery guarantee between the broker and the subscriber is QoS 1 (at most the minimum of the two levels). The guarantee is the minimum across both legs.

Retained Messages

A retained message is a flag on a PUBLISH packet that tells the broker: “store this message, and deliver it immediately to any client that subscribes to this topic in the future.” The broker keeps one retained message per topic.

This solves a common IoT problem: a dashboard connects to the broker and subscribes to sensors/building-a/floor-3/temperature. Without retained messages, the dashboard sees nothing until the sensor sends its next reading, which might be 60 seconds away. With a retained message, the broker immediately delivers the last known value, and the dashboard shows current data from the moment it connects.

Retained messages are not message history — the broker stores only the most recent one. For historical data, you need a separate storage tier that a subscriber writes into.

Last Will and Testament

When a client connects to the broker, it can specify a “will” message: a topic, payload, QoS, and retain flag that the broker should publish if the client disconnects unexpectedly (as opposed to a clean disconnect). This is the Last Will and Testament (LWT) mechanism.

LWT is how MQTT handles device failure notification. A sensor connects and specifies: “if I disconnect unexpectedly, publish offline to sensors/building-a/floor-3/status.” Any monitoring system subscribed to the status topic will see the offline notification within seconds of the sensor losing power or connectivity, without polling or any other explicit failure detection.

This is genuinely elegant — failure detection is built into the connection protocol, not the application layer. HTTP has no equivalent primitive; implementing similar behavior requires heartbeat endpoints, health checks, and monitoring systems.

MQTT 5.0

MQTT 3.1.1 was the dominant version for many years. MQTT 5.0, published in 2019, added significant capabilities that address practical pain points.

Reason codes. MQTT 3.1.1 had limited error reporting — a connection could be refused with a handful of fixed codes. MQTT 5.0 expands reason codes for every packet type, making it possible to understand why an operation failed.

User properties. Key-value pairs that can be attached to any packet. This allows application-level metadata to travel with messages without embedding it in the payload — useful for correlation IDs, schema versions, content types, and other routing metadata.

Message expiry interval. A time-to-live for messages. The broker discards messages older than the expiry interval rather than delivering stale data. Critical for IoT: a message telling a HVAC system to lower the temperature is worthless if it arrives three hours late.

Subscription identifiers. Allows a subscriber to tag its subscriptions with identifiers, which the broker includes when delivering matching messages. Useful when a single client has multiple wildcard subscriptions and needs to know which one matched.

Shared subscriptions. Multiple clients can subscribe using the same “shared subscription” group, and the broker load-balances messages across them round-robin. This enables competitive consumers for horizontal scaling — previously, every subscriber received every message, which made it difficult to parallelize processing.

Response topic and correlation data. Allows request-response patterns over MQTT. The publisher includes a response topic; the receiver publishes its response to that topic with the same correlation data. This is not native request-response (MQTT is fundamentally pub/sub), but it makes request-response patterns possible without layering a separate protocol.

Where MQTT Falls Short

MQTT is designed around specific tradeoffs. Understanding them prevents misuse.

The broker is a single point of failure. Every client connects to the broker. If the broker goes down, all communication stops. High-availability broker deployments require clustering (EMQX, HiveMQ both support this) or active-passive failover. This is manageable but adds operational complexity.

No native request-response. MQTT is publish/subscribe. MQTT 5.0’s response topic feature enables the pattern, but it is built on top of pub/sub rather than native to the protocol. If your use case is fundamentally request-response — “send a command, get a reply, know whether it succeeded” — MQTT requires more boilerplate than HTTP or gRPC would.

Topic design requires care. Topics are hierarchical strings, and the design of the topic namespace for a large system becomes a significant architectural decision. Poorly designed topic hierarchies become unmanageable; wildcard subscription patterns that match too broadly create performance problems. This is not unique to MQTT, but the lack of tooling around topic schema definition (compared to, say, OpenAPI for HTTP) makes it easy to accumulate technical debt.

Authorization is coarse. MQTT’s authentication is connection-level (username/password or TLS client certificates). Topic-level authorization — this client may publish to sensors/+ but may not subscribe to commands/+ — is supported by most brokers but configured outside the protocol, in broker-specific ways. There is no standard ACL format.

No message ordering guarantees across sessions. Within a single persistent session, messages on a given topic are ordered. Across reconnections, with queued messages flushing from multiple connections, ordering is not guaranteed. If your application requires strict message ordering across sessions, MQTT requires application-level sequence numbers.

MQTT in Practice

The constraint that MQTT is designed for is worth making concrete: an ESP32 microcontroller has 512KB of RAM and no operating system. It connects over WiFi using a small TCP/IP stack implemented in ROM. It needs to report sensor readings, receive commands, and detect when it has lost connectivity. HTTP is possible but heavyweight; establishing a new TLS+HTTP connection for every reading consumes meaningful power on a battery-operated device. MQTT’s persistent connection, compact packets, and built-in will messages are not just conveniences — they are what makes the application feasible.

The same attributes that make MQTT right for constrained devices also make it right for large-scale telemetry pipelines in the cloud. AWS IoT Core handles billions of MQTT messages per day, routing sensor data to DynamoDB, Lambda, S3, and other AWS services. The fan-out of a single published message to hundreds of subscribers, with the broker doing all the routing, scales naturally in ways that direct HTTP calls between services do not.

Where MQTT becomes the wrong choice is when your devices are not constrained, your network is reliable, and your communication patterns are fundamentally request-response. In that context, MQTT’s pub/sub model requires workarounds (the MQTT 5.0 response topic pattern) to do something that HTTP does natively. The broker adds latency that is unnecessary when the client and server are in the same datacenter. The appropriate choice in that context is one of the protocols covered in other chapters.

Brokers in Production

Running MQTT at scale means operating a broker. A few practical notes:

Mosquitto is the right choice for development and for single-instance deployments where high availability is not required. It is small, fast, well-documented, and widely understood. The operational model is simple.

EMQX (formerly EMQ X) is a distributed MQTT broker written in Erlang, designed for high availability and high throughput. It supports clustering natively, exposes a REST API for management, and can handle millions of concurrent connections. It is more complex to operate than Mosquitto.

HiveMQ is the enterprise option — commercial support, clustering, enterprise security features. Appropriate for large organizations where support contracts and compliance matter.

Cloud-managed brokers (AWS IoT Core, Azure IoT Hub, Google Cloud IoT Core) shift the operational burden to the provider. They handle clustering, availability, and scaling. The tradeoff is vendor lock-in, cost at scale, and limited configuration for edge cases that the managed service does not support.

The choice of broker is largely operational rather than protocol-level. The MQTT protocol works the same way regardless of which broker implements it; clients can switch brokers without code changes (beyond connection configuration).

AMQP and Message Brokers

The question that message brokers answer is not “how do I send data from A to B?” That is a networking question. The question is: “how do I decouple A from B such that A can send data even when B is down, B can process at its own rate, multiple Bs can share the work, and no messages are lost in the handoff?”

HTTP cannot answer that question. MQTT can answer parts of it. AMQP — Advanced Message Queuing Protocol — is designed to answer all of it, with formal semantics for routing, acknowledgment, durability, and transactions.

What AMQP Is

AMQP 0-9-1, published in 2008, is the protocol that RabbitMQ implements. (There is also AMQP 1.0, a different and incompatible standard published in 2011 and ISO-standardized; it is implemented by Apache ActiveMQ Artemis, Azure Service Bus, and others. When engineers say “AMQP” in the context of RabbitMQ, they almost always mean 0-9-1.)

AMQP defines a binary wire protocol with formal semantics for a set of concepts: exchanges, queues, bindings, messages, channels, consumers, and acknowledgments. The protocol is not just a transport — it specifies the behavior of the broker in enough detail that multiple independent implementations should be interchangeable for compliant clients.

In practice, the AMQP 0-9-1 specification had gaps, and broker implementations filled them differently. RabbitMQ added many extensions (publisher confirms, consumer priorities, dead letter exchanges, and others) that are widely used but not part of the base protocol. The pragmatic reality is that AMQP 0-9-1 defines the common vocabulary, and RabbitMQ’s specific behavior is what most users actually depend on.

The Broker Model

AMQP’s model has more moving parts than MQTT’s, and the parts interact in specific ways.

Exchanges receive messages from publishers. The exchange decides how to route a message based on its type and the message’s routing key.

Direct exchange: Routes messages to queues whose binding key exactly matches the message’s routing key. Used for targeted delivery.
Fanout exchange: Routes messages to all bound queues, ignoring the routing key. Used for broadcast.
Topic exchange: Routes messages to queues whose binding key matches the routing key using wildcard patterns (* for a single word, # for zero or more words). Used for flexible publish/subscribe with hierarchical routing.
Headers exchange: Routes messages based on header attributes rather than the routing key. Powerful but rarely used.

Queues store messages until they are consumed. Queues are durable (survive broker restart), auto-delete (disappear when the last consumer disconnects), or exclusive (private to a single connection). Messages in a durable queue on a durable exchange are persisted to disk, not just memory.

Bindings connect exchanges to queues. A queue is bound to an exchange with a binding key (or headers criteria for headers exchanges). The binding is what makes routing happen — without bindings, messages published to an exchange go nowhere.

Channels are virtual connections multiplexed over a single TCP connection. Each operation (publishing, consuming, acknowledging) happens on a channel. Channels are cheap to create and destroy; applications typically use one channel per thread or one per logical operation type.

This model gives you flexibility that simpler protocols do not:

A single message published to one exchange can be routed to many queues simultaneously (fanout or topic routing). Multiple services can consume from the same queue in a competing consumer pattern, processing messages in parallel without receiving duplicates (round-robin delivery). A service can consume from multiple queues and process them in priority order.

Acknowledgments and Durability

Message delivery in AMQP is explicit and acknowledgment-based.

A consumer receives a message and can:

Acknowledge (ack): the broker removes the message from the queue permanently. Delivery succeeded.
Negative-acknowledge (nack): the broker can either requeue the message (for another consumer to attempt) or discard it (for dead-letter routing). Delivery failed; retry or discard.
Reject: similar to nack, but for a single message. The broker requeues or discards based on the requeue flag.

A critical operational concept: messages that are delivered but not yet acknowledged are in an “unacknowledged” state. They cannot be delivered to another consumer while the current consumer holds them. If the consumer connection dies before acknowledging, the broker makes the message available again. This is the guarantee that prevents message loss on consumer failure.

The prefetch count (basic.qos) controls how many unacknowledged messages a consumer can hold at once. Setting it to 1 creates a strict round-robin with no buffering at consumers — the broker delivers the next message only after the current one is acknowledged. Setting it higher allows consumers to pipeline processing. The tradeoff is between throughput (higher prefetch) and fair distribution (lower prefetch, avoiding one slow consumer monopolizing the queue).

For durability:

The exchange must be declared durable
The queue must be declared durable
Messages must be published with delivery_mode: 2 (persistent)

All three conditions must hold. A persistent message on a non-durable queue is lost on restart; a durable queue on a non-persistent exchange is lost on restart. This is a common source of confusion: engineers declare a durable queue and assume durability, not realizing they also need persistent messages.

Publisher Confirms

Basic AMQP message acknowledgment is consumer-to-broker. Publisher confirms add broker-to-publisher acknowledgment: after the broker has persisted a persistent message to disk (or enqueued a transient one), it sends a confirm to the publisher.

Without publisher confirms, publishing is fire-and-forget. If the broker restarts immediately after receiving a message but before persisting it, the message is lost. With publisher confirms, the publisher can maintain an outstanding set of published-but-unconfirmed messages and resend them if the broker connection drops before they are confirmed.

Publisher confirms are a RabbitMQ extension, not part of the base AMQP 0-9-1 spec, but they are so fundamental to reliable publishing that they are essentially universal in production RabbitMQ deployments.

Dead Letter Exchanges

When a message is rejected (without requeue), or expires due to TTL, or is dropped because a queue is at its length limit, it can be routed to a dead letter exchange (DLX) rather than simply discarded. The DLX is a normal exchange that routes dead letters to a dead letter queue.

Dead letter queues are where poison messages, processing failures, and expired events go for inspection and remediation. An alert on a growing dead letter queue is one of the most valuable monitoring signals in a message-based system: it tells you that messages are failing to be processed, without losing the messages themselves.

The typical pattern is:

Messages are published to the main exchange.
Consumers process messages from the main queue. On unrecoverable errors, they nack without requeue.
Dead letters route to the DLX, then to the dead letter queue.
A separate process (or a human) inspects the dead letter queue, diagnoses the issue, and either republishes to the main queue (if the issue was transient) or archives the messages for manual review.

Without this mechanism, failed messages either disappear (if you discard failures) or cause endless retry loops (if you always requeue failures). The DLX pattern creates a clean separation between the happy path and the failure remediation path.

TTL, Priority, and Other Queue Properties

AMQP queues support several properties that control message handling:

Message TTL: Messages expire after a duration if not consumed. Prevents queue buildup of stale messages. Can be set per-queue (all messages in the queue expire after the same duration) or per-message.

Queue TTL: The queue itself expires after being unused for a duration. Useful for ephemeral queues created for temporary consumers.

Max length and max bytes: Caps the queue size. When a new message would exceed the cap, the broker either drops it to the DLX or drops the oldest message (depends on overflow setting).

Priority queues: Messages carry a priority (0-255). Higher-priority messages are delivered before lower-priority ones. Useful for urgent commands that should skip ahead of routine background work.

Lazy queues: Messages are written to disk immediately rather than kept in memory. Reduces memory pressure at the cost of higher latency. Appropriate for queues that may grow very large.

AMQP Versus Kafka

Any honest treatment of AMQP must address the Kafka comparison, because in many organizations the decision is not “AMQP or HTTP?” but “RabbitMQ or Kafka?”

They are designed for different purposes.

RabbitMQ is a message broker: it routes messages from publishers to consumers, tracks acknowledgments, and removes messages when they have been successfully processed. The unit is the message, and the model is delivery — once a consumer acknowledges, the message is gone.

Kafka is an event log: messages are written to partitioned, ordered logs, retained for a configurable duration, and consumers maintain their own offsets into the log. Messages are not removed when consumed; the same message can be consumed by many independent consumer groups, each maintaining its own position. Kafka is designed for event sourcing, stream processing, and audit trails.

The practical differences:

Replayability: Kafka consumers can replay from any point in the log. RabbitMQ messages are gone once acknowledged; replay requires application-level storage.
Consumer coordination: Kafka’s consumer groups handle partition assignment and rebalancing automatically. RabbitMQ’s competing consumers are simpler but less controllable.
Ordering: Kafka guarantees ordering within a partition. RabbitMQ guarantees ordering within a queue for single-consumer setups; competing consumers break ordering.
Throughput: Kafka is designed for very high write throughput (millions of messages per second). RabbitMQ’s throughput is lower but sufficient for most applications.
Complexity: Kafka requires a cluster, ZooKeeper (or KRaft in newer versions), careful partition configuration, and schema management for the event log. RabbitMQ is simpler to operate for single-instance or small-cluster deployments.

Use RabbitMQ when you need: task queues with work distribution, complex routing logic (multiple exchanges, binding patterns), transient messages that can be discarded after consumption, and relatively simple operational requirements.

Use Kafka when you need: event sourcing, stream processing with tools like Kafka Streams or Flink, replay and audit capabilities, very high throughput, and you are willing to pay the operational cost.

When AMQP Is the Right Choice

The core AMQP value proposition is: reliable, ordered, acknowledged, routable message delivery with formal guarantees about what happens when things fail.

This is the right choice for:

Work queues with multiple consumers. A queue of image processing jobs, each claimed by exactly one worker, acknowledged when the worker succeeds, requeued when the worker fails. AMQP handles this cleanly. HTTP would require a database-backed job queue, polling, and explicit locking — significant application code to replicate behavior that AMQP provides natively.

Integration between heterogeneous systems. An order placed in an e-commerce system needs to trigger fulfillment, inventory updates, and fraud detection. These systems are written by different teams, run at different rates, and have different availability guarantees. An AMQP exchange decouples them: the order system publishes; downstream systems consume at their own pace with their own acknowledgment logic.

Transient spike absorption. An email service can process 100 messages per second. An event causes 10,000 messages to arrive in the first minute. A queue absorbs the spike; the email service drains it at its own rate. This is load leveling — a fundamental pattern in reliable distributed systems — and AMQP queues implement it directly.

Guaranteed delivery where loss is unacceptable. Financial transactions, order confirmations, audit events — cases where losing a message has real consequences. AMQP’s combination of durable queues, persistent messages, and publisher confirms provides a strong delivery guarantee that HTTP request-response does not, because HTTP does not tell you what happens to a request after it is accepted.

The cases where AMQP is not the right choice: ultra-low latency requirements where broker overhead is unacceptable, IoT with constrained devices (MQTT wins there), streaming analytics where the log model matters (Kafka wins there), or simple request-response where the broker is unnecessary complexity.

ZeroMQ — Messaging Without a Broker

Every protocol in the previous two chapters runs through a broker. MQTT requires a broker. AMQP requires a broker. Kafka requires a cluster of brokers. The broker is the hub; clients are spokes; no spoke can reach another spoke directly.

The broker provides important things: durability, routing logic, queue management, acknowledgment tracking. But it also provides something you did not ask for: a centralized piece of infrastructure that can fail, that must be operated, that becomes a bottleneck, and that adds a network hop to every message.

ZeroMQ starts from a different premise. The question ZeroMQ asks is: what if the messaging capabilities lived in the application, not in infrastructure? What if the “broker” was an abstraction implemented by the library itself, peer-to-peer, without any central coordinator?

What ZeroMQ Is

ZeroMQ (also written ØMQ, ZMQ, or zmq) is a messaging library, not a message broker. You add it as a dependency to your application and get a set of socket types that implement messaging patterns at the library level. There is no server to deploy, no configuration to manage, no cluster to operate. The patterns — publish/subscribe, push/pull, request/reply, dealer/router — are implemented by the library using direct TCP (or IPC, or in-process) connections between participating processes.

This is architecturally unusual and worth thinking about carefully, because the mental model for ZeroMQ is different from every other protocol in this book.

When you create a ZeroMQ PUB socket and bind it to a port, you have created a publisher that accepts incoming connections from subscribers. When a subscriber creates a SUB socket and connects to the publisher’s port, a direct TCP connection is established between publisher and subscriber. Messages flow from publisher to subscribers over those direct connections. There is no broker in the middle.

The Socket Types

ZeroMQ’s core abstraction is the socket, but ZeroMQ sockets are nothing like BSD sockets. A ZeroMQ socket encapsulates a messaging pattern, not a connection. Under the hood, a single ZeroMQ socket may manage many simultaneous TCP connections, do internal queuing, and handle connection management automatically. From the application’s perspective, you just send and receive messages.

REQ / REP (Request-Reply)

The most basic pattern: a REQ socket sends a request and blocks waiting for a reply. A REP socket receives a request and must send exactly one reply before receiving the next request.

# Server
ctx = zmq.Context()
socket = ctx.socket(zmq.REP)
socket.bind("tcp://*:5555")
while True:
    message = socket.recv()
    socket.send(b"pong")

# Client
socket = ctx.socket(zmq.REQ)
socket.connect("tcp://localhost:5555")
socket.send(b"ping")
reply = socket.recv()

The strict alternation (send, receive, send, receive) makes REQ/REP fragile in practice: if a reply is never received, the socket is stuck. REQ/REP is useful for simple scripts and tests, but for production use you almost always want DEALER/ROUTER.

DEALER / ROUTER (Async Request-Reply)

DEALER is like REQ but non-blocking: it can send multiple messages without waiting for replies. ROUTER is like REP but address-aware: each incoming message is prefixed with the sender’s identity, and replies must include the identity to route back to the correct sender.

DEALER/ROUTER is the pattern for building async servers, load balancers, and proxies. A load balancer connects a ROUTER facing clients with a DEALER facing workers, forwarding messages between them in whatever pattern the load-balancing strategy requires.

PUB / SUB (Publish-Subscribe)

A PUB socket publishes messages. Any number of SUB sockets can connect and receive copies. Subscribers filter messages by prefix: a subscriber that calls socket.setsockopt(zmq.SUBSCRIBE, b"sensor:") receives only messages whose payload begins with sensor:.

# Publisher
socket = ctx.socket(zmq.PUB)
socket.bind("tcp://*:5556")
while True:
    socket.send_multipart([b"sensor:temp", b"23.5"])
    socket.send_multipart([b"sensor:humidity", b"67.2"])

An important property: PUB/SUB in ZeroMQ is lossy by default. If a subscriber is slow or a subscriber has not connected yet, the publisher does not block and does not buffer messages for that subscriber. Slow subscribers fall behind and lose messages. This is deliberate — the publisher’s throughput is not limited by the slowest subscriber.

If you need guaranteed delivery in a PUB/SUB pattern, you need to add reliability on top: a subscriber can request missed messages via a separate REQ/REP channel, or you can add a persistent intermediary.

PUSH / PULL (Pipeline)

PUSH sends messages round-robin to connected PULL sockets. PULL receives from any connected PUSH socket. This is the work queue pattern without a broker: a PUSH socket distributes tasks; PULL sockets (workers) claim them.

[ Ventilator (PUSH) ] → [ Worker 1 (PULL) ]
                       → [ Worker 2 (PULL) ]
                       → [ Worker 3 (PULL) ]

Workers process tasks independently and send results to a PUSH socket connected to a collector’s PULL socket. The ventilator–worker–collector pipeline is a classic ZeroMQ pattern and works well for embarrassingly parallel computation.

PAIR

PAIR connects exactly two sockets. Useful for intra-process or inter-thread communication where you want the simplicity of a dedicated channel without the overhead of managing connections manually.

The Transport Layer

ZeroMQ supports multiple transport mechanisms, switchable by changing the address prefix:

tcp:// — Standard TCP connections, for inter-process and inter-host communication
ipc:// — Unix domain sockets, for inter-process communication on the same host (faster than TCP loopback)
inproc:// — In-process communication, for inter-thread messaging without any OS involvement (fastest possible; zero serialization overhead when using the same context)
pgm:// and epgm:// — Pragmatic General Multicast, for UDP multicast (rarely used; requires multicast-capable network infrastructure)

The transport is a concern of the address string, not the socket type. A PUB socket can bind on tcp:// in one deployment and switch to ipc:// in another without any code change.

What ZeroMQ Does Well

Raw throughput. ZeroMQ is fast. The library does minimal copying, uses lock-free data structures internally where possible, and batches small messages automatically. Published benchmarks have shown ZeroMQ achieving millions of messages per second on commodity hardware. The overhead per message is measured in microseconds, not milliseconds.

Flexibility in topology. ZeroMQ does not impose a topology. You can build star networks, rings, pipelines, trees, meshes — any topology that your application requires. The same socket types work in all of them. This is particularly useful for systems where the communication topology changes based on runtime conditions.

No infrastructure dependencies. For a system where every microsecond of operational complexity matters, having no broker to deploy, configure, and maintain is a genuine advantage. The messaging is part of the application; it deploys with the application; it fails with the application rather than independently.

Language breadth. ZeroMQ bindings exist for essentially every language: C, C++, Python, Java, Go, Rust, Ruby, Node.js, Erlang, and many more. The core library is C, and the bindings are thin wrappers. Messages cross language boundaries with zero transformation overhead.

Dynamic topology. ZeroMQ handles connection management automatically. A PUB socket bound to a port accepts new SUB connections as they arrive; it detects and handles disconnections. A DEALER that connects to multiple ROUTER endpoints and one goes down will route around it when the reconnect timer fires. Topology changes do not require restart or reconfiguration.

What ZeroMQ Does Not Do

No persistence. ZeroMQ does not persist messages to disk. If no SUB socket is connected when a PUB socket sends, the message is gone. If a PULL worker crashes before acknowledging a task, the task is gone. ZeroMQ is purely an in-memory, in-process messaging layer.

No acknowledgment. ZeroMQ has no native acknowledgment primitive. Send returns when the message is in the outgoing queue; there is no notification when the receiver processes it. Building at-least-once delivery requires application code that adds acknowledgment messages and resend logic.

No routing beyond prefix filtering. PUB/SUB subscription filters are prefix-based string matching. There is no content-based routing, no topic hierarchy with wildcards, no exchange-and-binding model. If you need to route messages based on complex criteria, you implement the routing logic in your application.

No authorization or encryption in the core. ZeroMQ’s base implementation is unauthenticated and unencrypted. ZAP (ZeroMQ Authentication Protocol) provides an authentication layer, and CURVE is an elliptic-curve encryption and authentication layer. Both work well, but they are not defaults — you must explicitly configure them, and few tutorials show you how.

Complex patterns require discipline. The ZeroMQ guide (known as the “zguide”) is excellent, but it runs to several hundred pages for a reason. Building reliable messaging on top of ZeroMQ’s primitives requires understanding failure modes — what happens when a subscriber falls behind, when a DEALER’s connected ROUTER goes down, when the network partitions — and implementing appropriate handling. AMQP’s broker handles these failure modes for you; ZeroMQ exposes them to you.

Nanomsg and NNG

Nanomsg was created by Martin Sústrik, one of ZeroMQ’s original authors, as a cleaner reimplementation that addressed some of ZeroMQ’s design decisions (the C++ codebase, the license, the architecture of the library). NNG (Nanomsg Next Generation) is a further evolution by Garrett D’Amore.

NNG supports the same socket patterns (PUB/SUB, REQ/REP, PUSH/PULL, and others) with a cleaner C API, more transport options (TLS built in, WebSocket support), and an aio (asynchronous I/O) model. For new projects that want ZeroMQ-style messaging without ZeroMQ’s historical baggage, NNG is worth evaluating.

When ZeroMQ Fits

ZeroMQ is the right choice when:

You need low-latency messaging between cooperating processes and the overhead of a broker is unacceptable. Financial systems that route orders between components with microsecond budgets use ZeroMQ. HFT firms use ZeroMQ. Real-time game servers use ZeroMQ.

You need flexible topology that would require complex configuration in a broker-based system. If your communication graph changes dynamically or has an unusual shape, ZeroMQ’s lack of a central coordinator can simplify the topology rather than complicate it.

You are in an environment where broker deployment is impractical. Embedded systems, edge computing, development environments, or systems that must function without infrastructure — these benefit from ZeroMQ’s library-level approach.

You are building the messaging infrastructure rather than using it. ZeroMQ is the right foundation for building higher-level messaging abstractions because its socket primitives are low enough to compose.

ZeroMQ is the wrong choice when:

You need durability. If messages must survive process restarts, power failures, or network partitions, you need a broker with persistent storage.

You need guaranteed delivery without building it yourself. At-exactly-once semantics require acknowledgment, storage, and resend logic — all the things a broker provides.

Your team is unfamiliar with the failure modes. ZeroMQ’s flexibility comes with the responsibility of handling failures that a broker would handle for you. The learning curve is real, and the mistakes are subtle (silent message loss is ZeroMQ’s most common surprise).

The choice between ZeroMQ and a broker-based system is fundamentally a choice about where you want the complexity to live: in infrastructure that someone else operates (broker), or in application code that you write and maintain (ZeroMQ). Neither is universally better. The right choice depends on whether your reliability requirements are better met by infrastructure guarantees or by application-level control.

Cap’n Proto RPC and the Binary Frontier

In 2013, Kenton Varda — one of the primary authors of Protocol Buffers while at Google — published Cap’n Proto. His stated goal was to fix what he saw as Protocol Buffers’ fundamental design mistake: encoding. Protocol Buffers serializes data by traversing a data structure and writing fields sequentially to a byte buffer. Cap’n Proto does not serialize at all. The wire format is the in-memory format.

This is not a marketing claim. It is a specific design decision with measurable consequences, and understanding it changes how you think about the entire category of binary serialization protocols.

The Zero-Copy Design

When you decode a Protocol Buffers message, the library reads bytes from the wire and constructs objects in memory — copying data, allocating memory, parsing varint-encoded integers. When you’re done with the objects, they’re garbage collected. Serialization is the reverse: traverse the objects, emit bytes. Every message crossing a process boundary involves at minimum two full traversals and two copies.

Cap’n Proto’s wire format is a memory layout that can be mmap’d and used directly. A Cap’n Proto message arriving from the network is a sequence of bytes that is already a valid in-memory representation. Accessing a field in a Cap’n Proto message is a pointer dereference, not a parse. There is no decode step.

This is possible because Cap’n Proto uses a flat, pointer-based representation:

Structs are laid out as a fixed-size data section (for primitive fields) followed by a pointer section (for variable-length fields, lists, and nested structs)
Pointers are relative offsets, so the structure is position-independent (can be moved in memory without fixing up pointers)
Lists are contiguous arrays with a list pointer header specifying element type and count
The entire message is a sequence of “segments” that can be transmitted across different memory allocations or memory-mapped from files

The result: reading a field from a received Cap’n Proto message has zero allocation overhead and requires only a bounds-checked pointer arithmetic operation. For read-heavy workloads — parsing configuration, deserializing cache entries, reading from a file format — this is a substantial performance difference.

The Schema Language

Cap’n Proto schemas have a superficial resemblance to Protocol Buffers but differ in several important ways:

@0xd5d8c89b5e00d43c;  # Unique file ID

struct PaymentRequest {
  idempotencyKey @0 :Text;
  amountCents @1 :Int64;
  currency @2 :Text;
  sourceToken @3 :Text;
  metadata @4 :List(KeyValue);
}

struct KeyValue {
  key @0 :Text;
  value @1 :Text;
}

interface PaymentService {
  charge @0 (request :PaymentRequest) -> (response :PaymentResponse);
  streamEvents @1 (filter :EventFilter) -> (stream :StreamHandle);
}

Field numbering is explicit and permanent. Like Protocol Buffers, fields are identified by integers, enabling backward-compatible evolution. Unlike Protocol Buffers, the field ordering affects the memory layout — higher-numbered fields cannot always be efficiently added without considering the layout.

Unique file IDs. The @0xd5d8c89b5e00d43c at the top is a unique identifier for this schema file. It enables schema-aware tools (like the Cap’n Proto reflection API) to identify schemas without relying on file paths.

Interface types. Cap’n Proto schemas can define interface types (RPC services), not just data types. This unifies the schema and the RPC definition — the same file describes both the data format and the service interface.

Promise Pipelining — The Unusual Part

Cap’n Proto RPC includes a feature called promise pipelining (or time-travel RPC) that has no equivalent in gRPC, Thrift, or any other mainstream RPC framework.

In conventional RPC, calling a method that returns an object and then calling a method on that object requires two round trips:

Client → Server: getUser(id)
Server → Client: User { profile_capability: <capability_ref> }
Client → Server: profile_capability.getProfile()
Server → Client: Profile { ... }

With Cap’n Proto promise pipelining, you can chain calls before the first one completes:

Client → Server: getUser(id)
Client → Server: [promise from getUser].profile_capability.getProfile()

The second call is sent immediately, referring to the result of the first call by a “promise reference.” The server receives both messages and can pipeline them — it processes the first, and when the result is ready, processes the second using that result, without requiring another client-server round trip.

For systems with high latency between components (cross-datacenter RPCs, satellite links), this can reduce latency proportional to the number of chained calls. A chain of five sequential calls that would require five round trips requires only one round trip with promise pipelining.

This is implemented through Cap’n Proto’s capability system. Rather than returning raw data that the client must use to construct the next request, Cap’n Proto methods return capabilities — unforgeable references to objects that the client can call methods on. The capability table in a Cap’n Proto message is what enables promise pipelining at the protocol level.

Capabilities as Security Primitives

Cap’n Proto’s capability model has implications beyond performance.

A capability is a reference that grants authority to invoke operations on the referenced object. You cannot forge a capability — you can only acquire one by receiving it from someone who already has it. This is the object-capability model, and it is a formal security model with properties that ACL-based systems do not have.

In practice: if a Cap’n Proto server returns a capability to a resource, the client can invoke that capability. If the server does not return a capability, the client has no way to manufacture one. Least-privilege access control falls out naturally from the capability model — you give clients access to exactly what they need by giving them capabilities to exactly those objects.

This sounds theoretical but has practical consequences. A service that uses Cap’n Proto capabilities can ensure that clients can only access objects they have been explicitly granted access to, without maintaining a separate authorization table. The capability is the authorization; holding the capability proves authorization.

FlatBuffers: Google’s Answer

FlatBuffers was developed at Google, also pursuing zero-copy serialization. The design is similar to Cap’n Proto in the key respect: the wire format is directly usable as an in-memory format without a decode step.

The differences are in the design details:

FlatBuffers does not have a native RPC system (Cap’n Proto does)
FlatBuffers has better support for table modifications (adding fields with forward compatibility) without rewriting the entire buffer
FlatBuffers is more permissive about mutating values in-place after creation
FlatBuffers uses a builder pattern that results in back-to-front construction (objects are written in reverse, and the root object is at the end of the buffer); Cap’n Proto uses a more direct construction model
Cap’n Proto has better support for very large messages that don’t fit in a single allocation (through its segment model)

FlatBuffers is widely used in game development (it was created for game asset serialization) and in latency-critical financial systems. Cap’n Proto is used in systems where the RPC layer and capability security model matter, most notably Cloudflare Workers (which uses Cap’n Proto for internal RPC between worker processes).

MessagePack

MessagePack is a different point in the design space: not zero-copy, but compact binary JSON. It uses a format that closely mirrors JSON’s type system (null, boolean, integer, float, string, array, map) but encodes values in binary rather than text.

The key property: MessagePack messages are smaller than equivalent JSON and faster to parse than JSON, but there is no schema — the structure is ad-hoc, like JSON. MessagePack is appropriate when you want the flexibility of JSON with better performance, and you do not want to commit to a schema. It is not appropriate when you need the type safety, validation, or performance of a schema-based format.

Redis uses MessagePack internally. Many game backends use it for client-server communication where the flexibility of a schemaless format outweighs the performance cost.

The Adoption Reality

Cap’n Proto is technically impressive. The zero-copy design, promise pipelining, and capability model are all genuine innovations. The adoption, however, is limited.

The reasons are practical:

Language support is narrow. C++ and Rust have mature implementations. Python and Java have workable implementations. Other languages are hit-or-miss.
The tooling ecosystem is thin compared to Protocol Buffers.
Promise pipelining requires both client and server to be aware of it; mixed deployments with gRPC backends cannot benefit from it.
The capability model is powerful but requires adopting a programming model that is unfamiliar to most engineers.

Cap’n Proto’s primary real-world adoption is at Cloudflare (where it was developed in-house for Workers) and in systems where the zero-copy performance matters more than the library ecosystem. For most teams, Protocol Buffers offers a better tradeoff between performance and ecosystem.

The Binary Frontier in Context

The protocols in this chapter — Cap’n Proto, FlatBuffers, MessagePack — represent a category of choice: binary serialization formats that are faster and more compact than JSON. They differ in whether they include an RPC layer (Cap’n Proto does; FlatBuffers and MessagePack do not), whether they require a schema (Cap’n Proto and FlatBuffers do; MessagePack does not), and how they handle encoding (zero-copy versus full decode).

The practical question is not “which binary format is theoretically best?” It is “what does my system need, and what can my team operate?”

If you are at the scale where JSON serialization overhead is a measurable cost — millions of messages per second, latency budgets in microseconds, processes that spend meaningful CPU time in JSON parse — binary serialization is worth the investment. Protocol Buffers is the safe default with the widest language support. FlatBuffers is the right choice if zero-copy matters and your language has a mature FlatBuffers library. Cap’n Proto is the right choice if you want the RPC layer and the capability model and you are committed to the C++ or Rust ecosystem.

If you are not at that scale, the performance of JSON is probably not your bottleneck, and the operational simplicity of human-readable payloads is worth more than a few microseconds of parse time.

Lesser-Known Contenders

The protocols in the preceding chapters are the ones that come up in engineering discussions and appear in job descriptions. But the protocol landscape is wider than any canonical list. This chapter covers the protocols that are real, used in production, and worth understanding — even if they do not dominate conference talks.

Apache Thrift

Apache Thrift was developed at Facebook in 2007 and open-sourced in 2008. It predates gRPC by several years and solves the same core problem: define a service interface once and generate client and server code in multiple languages.

A Thrift definition file describes services and data types:

namespace py payments
namespace java com.example.payments

enum ChargeStatus {
  SUCCESS = 1,
  DECLINED = 2,
  ERROR = 3
}

struct ChargeRequest {
  1: required string idempotency_key,
  2: required i64 amount_cents,
  3: required string currency,
  4: optional string source_token
}

struct ChargeResponse {
  1: required string transaction_id,
  2: required ChargeStatus status
}

service PaymentService {
  ChargeResponse charge(1: ChargeRequest request)
    throws (1: InvalidRequestException invalid, 2: ServiceException error)
}

Thrift generates client stubs and server skeletons for a wide range of languages. Where it diverges from gRPC is in its transport and protocol flexibility: Thrift separates transport (TCP, HTTP, in-memory) from protocol (binary, compact binary, JSON) from service (the actual method dispatch). You choose a combination. A development server might use the JSON protocol over HTTP for debuggability; production uses the compact binary protocol over TCP for efficiency.

Thrift’s adoption peaked when it was the dominant cross-language RPC framework before gRPC existed. Its continued use today is largely in organizations that built on Thrift before gRPC was available and have too much investment to migrate. Twitter, Evernote, and many others ran on Thrift for years. The framework is mature, battle-tested, and well-understood; it is just no longer the default choice for new systems.

The comparison with gRPC: Thrift has more flexible transport/protocol combinations and no dependency on HTTP/2. gRPC has better ecosystem support today, server reflection, and streaming that is integrated into the protocol rather than added as an extension. For new polyglot RPC projects, gRPC wins on ecosystem; Thrift wins if you specifically need the transport flexibility or are in a Java-heavy organization with existing Thrift infrastructure.

Apache Avro RPC

Apache Avro is primarily known as a data serialization format used in the Hadoop ecosystem. Its schema language describes data structures in JSON:

{
  "type": "record",
  "name": "ChargeRequest",
  "fields": [
    {"name": "idempotency_key", "type": "string"},
    {"name": "amount_cents", "type": "long"},
    {"name": "currency", "type": "string"},
    {"name": "source_token", "type": ["null", "string"], "default": null}
  ]
}

Avro RPC uses these schemas for service definition, with the interesting property that schema exchange happens at connection time: the client and server send each other their schemas during the handshake, and the protocol handles schema evolution by mapping between the writer’s schema (the schema the message was written with) and the reader’s schema (the schema the reader uses now). This means a reader can evolve its schema independently of the writer, as long as the schemas are compatible.

In practice, Avro RPC’s adoption as a general-purpose RPC mechanism is minimal. Its primary use is within the Kafka ecosystem, where Avro is the standard format for Kafka messages paired with the Confluent Schema Registry. The Schema Registry stores schema versions, and Avro consumers use schema resolution to handle producers and consumers that are on different schema versions. If you are using Kafka with Avro, you are using Avro’s schema evolution model. If you are not in the Kafka/Hadoop ecosystem, you are probably not using Avro RPC.

NATS

NATS is a messaging system developed at Apcera (later acquired by Synadia), written in Go, and released as open source. It occupies an interesting position: lighter than Kafka, simpler than RabbitMQ, distributed natively, and fast.

The core model is publish/subscribe over subjects (NATS’s term for topics). A publisher sends a message to a subject; subscribers that have expressed interest in that subject receive it. Subject matching supports wildcards: sensors.> matches any subject that starts with sensors.; sensors.* matches exactly one token after sensors..

NATS is unusual in having a request-reply pattern built into the base protocol. A publisher can send a message with a reply-to subject; the recipient publishes its response to that reply-to subject. NATS clients make this pattern ergonomic by automatically generating unique reply-to subjects and handling the subscription lifecycle.

NATS JetStream, added in 2021, extends the base NATS model with persistence, at-least-once and exactly-once delivery, consumer groups (for competing consumers), stream replay, and key-value stores. JetStream transforms NATS from a best-effort messaging system into a full-featured event streaming platform that competes with Kafka in its target use case.

The case for NATS over Kafka: simpler operational model (NATS is a single binary; Kafka requires ZooKeeper or KRaft plus the brokers themselves), lower latency for smaller message volumes, built-in request-reply semantics, and strong Go and Rust library support. The case for Kafka over NATS: higher throughput for very large volumes, more mature stream processing ecosystem (Kafka Streams, Flink integration), and broader enterprise adoption.

NATS is a strong choice for teams that want Kafka-like semantics with less operational overhead. It is particularly popular in Kubernetes environments, where the operational simplicity aligns well with the container deployment model.

Twirp

Twirp is a minimalist RPC framework developed by Twitch, built on Protocol Buffers and HTTP/1.1 (and HTTP/2) rather than the full gRPC protocol. The service definition uses the same .proto format as gRPC, but the generated client/server code communicates over plain HTTP with either binary (protobuf) or JSON content.

The appeal of Twirp over gRPC is exactly the gRPC problems from Chapter 2: Twirp works with standard HTTP/1.1, is compatible with HTTP load balancers without L7 awareness, supports browser clients without a proxy layer, and can be debugged with curl when using the JSON content type.

The tradeoff: Twirp does not support streaming. It is strictly unary RPC — one request, one response. If you need server streaming, client streaming, or bidirectional streaming, you need gRPC.

For teams that want the ergonomics of protobuf code generation and type-safe RPC without the operational complexity of gRPC, Twirp is a reasonable choice. It is a simpler system that is easier to operate, debug, and integrate with standard HTTP infrastructure, at the cost of streaming support.

JSON-RPC and XML-RPC

JSON-RPC is a stateless remote procedure call protocol that uses JSON for encoding and HTTP (or WebSockets, or any transport) for transmission. A JSON-RPC request is:

{
  "jsonrpc": "2.0",
  "method": "payment.charge",
  "params": {"amount_cents": 1000, "currency": "USD"},
  "id": "req-123"
}

The response:

{
  "jsonrpc": "2.0",
  "result": {"transaction_id": "txn_abc123", "status": "SUCCESS"},
  "id": "req-123"
}

JSON-RPC is simple enough to implement in an afternoon. It provides a method-based calling convention over JSON, with batch request support and a standard error format. It is used heavily in the Ethereum ecosystem (the standard JSON-RPC API for interacting with Ethereum nodes follows this protocol) and in VS Code’s Language Server Protocol.

XML-RPC is the ancestor of JSON-RPC, replacing JSON with XML. It predates both REST and SOAP and is almost entirely confined to legacy systems today. It is worth knowing as history — it demonstrates that non-REST, method-based RPC over HTTP existed long before gRPC — but you would not choose it for new work.

GraphQL Over WebSocket

GraphQL is primarily associated with HTTP, but its subscription model — real-time data that the server pushes to the client when data changes — is typically implemented over WebSockets using a protocol called graphql-ws (the current standard) or the older subscriptions-transport-ws.

The graphql-ws protocol is a thin framing layer over WebSockets: clients send subscribe/unsubscribe messages with GraphQL subscription documents; servers send data, error, and completion messages. The transport is WebSockets; the query language is GraphQL’s subscription syntax.

This is relevant to the protocol discussion because it represents a class of hybrid design: take a query language designed for HTTP request-response, and extend it to server push by swapping the transport for WebSockets. The subscription model fits naturally because GraphQL subscriptions define exactly what data the client wants to receive, and the server evaluates that subscription on every relevant event.

If you are already using GraphQL for your API, the websocket subscription mechanism is usually the right way to add real-time push — it reuses the existing schema, the existing authentication and authorization model, and the existing client libraries. If you are not using GraphQL, this is not a reason to adopt it.

Honorable Mentions

gRPC-Web deserves mention as a distinct protocol from gRPC. It strips out the parts of gRPC that do not work in browsers (trailer-based status codes, client streaming) and encodes messages in a format that browser fetch APIs can handle. It requires a proxy (Envoy or grpc-web-proxy) to translate to real gRPC for the backend. It is not a general replacement for gRPC but a specific bridge for browser clients.

Cap’n Proto’s siblings — Flatcc and bebop are alternative zero-copy serialization formats with different language-target and performance characteristics. Bebop in particular has very fast generated code for .NET environments.

OpenTelemetry Protocol (OTLP) is not a general-purpose RPC protocol but worth knowing: it is a Protocol Buffers-based protocol over HTTP/2 or gRPC used for transmitting telemetry data (metrics, traces, logs). If you are building observability pipelines, OTLP is the emerging standard that replaces custom Jaeger and Zipkin formats.

Protobuf over HTTP (sometimes called “proto over REST”) is the approach of using Protocol Buffers as the serialization format for HTTP/1.1 endpoints, without the gRPC protocol layer. You get the schema benefits and compact encoding of protobuf, with none of the gRPC operational complexity and full compatibility with standard HTTP infrastructure. Twirp is one formalization of this pattern; hand-rolled implementations are also common. This is underused — many teams that want better-than-JSON serialization reach for gRPC when protobuf-over-HTTP would meet their needs with less overhead.

The Pattern Across All of These

Looking across the lesser-known contenders, a pattern emerges: most of them are solving one or two specific problems with mainstream protocols, and the solution comes with tradeoffs that explain why they did not displace the mainstream options.

Thrift solved polyglot RPC before gRPC existed, but without HTTP/2’s transport and without gRPC’s ecosystem investment. NATS solved lightweight pub/sub without Kafka’s operational complexity, but without Kafka’s throughput. Twirp solved gRPC’s browser and load-balancer problems, but without streaming. JSON-RPC added method-dispatch semantics to JSON, but without schemas or types.

The mainstream protocols won by being good enough across many dimensions simultaneously. The lesser-known ones are often better on one specific dimension and worse on others. That is useful information: if the specific dimension where the lesser-known option excels matches your primary constraint, it might be the right tool regardless of its market position.

What Nobody Has Tried Yet

Every protocol in this book was a response to a specific constraint: JSON too slow, request-response too limiting, brokers too complex, serialization too expensive. The constraints that drove those designs are well understood. The protocols that will exist in ten years will respond to constraints that are just now becoming visible, or that we are only beginning to articulate clearly.

This chapter is speculative. It is about directions, not products. Some of these ideas are active research areas with prototypes; some are directions that the constraints of distributed systems seem to be pointing toward without anyone yet having synthesized the right design; some are genuine open questions. Treat it as a map of the unexplored territory rather than a guide to specific destinations.

QUIC-Native Protocols

HTTP/3 runs QUIC under HTTP semantics. But QUIC is a general-purpose transport layer with properties that HTTP does not fully exploit.

QUIC gives you:

Independent streams that do not suffer head-of-line blocking (a lost packet affects only the stream it belongs to, not others on the same connection)
0-RTT connection establishment for resumed connections (no handshake round trip for known servers)
Connection migration (a QUIC connection can survive the client’s IP address changing — when a mobile device transitions from WiFi to cellular, the connection continues)
Built-in loss detection and recovery per-stream rather than per-connection

gRPC over QUIC is a real project and is being implemented, but it layers gRPC’s existing HTTP/2 semantics onto QUIC — it does not redesign the protocol to take advantage of what QUIC enables.

The more interesting possibility is a protocol designed from scratch for QUIC’s semantics: one that treats streams as first-class primitives, uses connection migration for seamless handoff between networks, and takes advantage of 0-RTT for latency-critical applications. The mobile use case is compelling: an API protocol where the client transitions from WiFi to cellular mid-session without any connection interruption, reconnection delay, or application-level retry. That is technically possible with QUIC today; no widely-deployed API protocol has been designed specifically for it.

MoQ (Media over QUIC), currently in IETF standardization, is an early example of a non-HTTP QUIC protocol — specifically for live video and game streaming. Its design choices (object-based delivery, subscriber relay model, prioritized delivery within a connection) would not be possible with TCP and are not natural within HTTP’s request-response model. MoQ is unlikely to be the final word, but it demonstrates that QUIC-native protocol design is a real space.

Algebraic Effects as an RPC Model

Contemporary RPC is a remote procedure call: you invoke a function, wait for a result, use the result. The caller blocks (or uses a callback/promise/async/await to avoid blocking the thread) until the callee responds. This is a specific programming model, and it carries specific limitations.

Algebraic effects are a programming language concept (implemented in languages like Koka, Eff, and as a research feature in OCaml) that generalizes exception handling: rather than throwing an exception that unwinds the stack, you “raise” an effect, which is handled by an effect handler further up the call stack that can resume the computation where it left off. Unlike exceptions, effects are resumable.

Applied to RPC, this suggests a different interface: rather than calling a remote function and waiting, you express your computation in terms of effects, and the runtime handles the question of whether those effects are satisfied locally or remotely. A function that needs data from a remote service does not explicitly make an RPC call; it raises an effect, and the effect handler decides how to satisfy it — maybe by making a network call, maybe by returning cached data, maybe by batching multiple effect requests into a single network round trip.

This is not currently implemented at the protocol level. Languages with algebraic effects are research or niche. But the concept points at something real: the current RPC model mixes data access patterns (which data do I need?) with transport concerns (how do I get it?), and that mixing is the source of many of the problems that libraries like DataLoader (for GraphQL N+1 queries) try to solve after the fact.

A protocol built around effect-based semantics would separate these concerns: the application declares its data needs; the runtime decides how to satisfy them efficiently. The protocol layer would need to support batching, merging requests, pipelining, and streaming in ways that the application code does not explicitly orchestrate.

Local-First and Offline-First APIs

The dominant API model assumes network connectivity. A client calls a server; the server holds the ground truth; the client is a thin display layer that becomes useless when the network is unavailable. This assumption is so deeply embedded in REST’s design that it is invisible until you need to violate it.

The local-first software movement (articulated clearly by Kleppmann et al. in their 2019 paper) argues for an architecture where the ground truth lives in local storage on the user’s device, and the network is a synchronization mechanism rather than a data source. Offline operation is the default; synchronization happens when connectivity is available.

Technically, this requires:

Conflict-free Replicated Data Types (CRDTs): data structures that can be modified independently on multiple devices and merged without conflicts. Operations are commutative and associative, so merge order does not matter. The challenge is designing CRDT data models that match application semantics rather than just set operations.

Vector clocks and causal consistency: tracking causality across replicas so that “A happened before B” relationships are preserved during synchronization, even when the devices communicating them were disconnected when the events occurred.

Sync protocols: the actual protocol for exchanging state between replicas. This is an open design space. Existing approaches (rsync-style diffing, Operational Transforms, CRDT-native sync) all have tradeoffs in bandwidth efficiency, convergence guarantees, and implementation complexity.

No standard protocol for local-first sync exists. Applications use custom solutions, or libraries like Automerge and Yjs (which implement specific CRDT-based approaches). The opportunity is for a general-purpose sync protocol that:

Works over any transport (HTTP, WebSockets, Bluetooth, local network)
Handles partial connectivity (sync with what’s available)
Provides strong eventual consistency guarantees
Composes with standard authentication and authorization mechanisms

This is a protocol design problem that nobody has solved in a general way.

Reactive APIs and Push-Based Truth

The request-response model has the client asking questions and the server answering them. An alternative model has the server asserting truths and the client subscribing to changes.

Differential Dataflow (from Frank McSherry’s work) and reactive databases like Materialize implement something close to this: you write a SQL query, and rather than evaluating it once and returning a result, the system evaluates it continuously and pushes incremental updates when the underlying data changes. The client subscribes to a query’s result set; the server pushes +row and -row deltas as the result changes.

This flips the API model: the client does not poll for current state; the current state is pushed to the client continuously as it changes. The bandwidth usage is proportional to the rate of change rather than the frequency of polling. Stale data is not a thing — the client always sees the current state because the server has been updating them in real-time.

The challenge is reconciling this model with existing network infrastructure. WebSockets can carry the delta stream, but the semantics — subscribing to a query result rather than a topic — require server-side query evaluation infrastructure that is not standard. Systems like Supabase’s Realtime and Fauna’s reactive queries implement specific variants of this model, but there is no protocol standard.

A future protocol in this space might define: a query language (probably something SQL-like), a format for incremental result set deltas, semantics for subscription management and query cancellation, and connection recovery with state reconciliation. It would likely require differentiated server infrastructure — not just a message broker but a reactive query engine.

P2P API Models

Every protocol in this book assumes a client-server topology. One side serves; the other side consumes. But the rise of edge computing, local-area network applications, and systems built on libp2p (the protocol stack underlying IPFS) suggests that peer-to-peer API models may become more relevant.

In a P2P API model, there is no fixed server. Any node can call methods on any other node; discovery happens through distributed mechanisms (distributed hash tables, gossip protocols, mDNS for local networks). Content addressing (routing to where specific data is, rather than to a specific machine) replaces location addressing.

The practical use cases today are niche: distributed applications, peer-to-peer games, local network device communication, censorship-resistant systems. But the boundary between “edge server” and “powerful client device” is narrowing. A future where a mobile device does computation for its peers in a local mesh, and the concept of “calling an API” means invoking computation on whichever device in your local network has the capability, is not science fiction — it is the direction that WebRTC’s data channels and libp2p’s protocol stack are pointed at.

The protocol design challenge is significant: discovery, authentication, capability advertisement, load distribution, and failure handling all work differently in a peer-to-peer topology than in a client-server topology.

The Compression Opportunity

This is less speculative than the others, but important: serialization and compression are separate concerns in current designs, and combining them more tightly has room for improvement.

Protocol Buffers compresses well with general-purpose compressors (gzip, Brotli, zstd) because its binary format lacks the redundancy of JSON text. But general-purpose compressors do not know anything about the data’s schema. A schema-aware compressor that knows field types, probable value distributions, and correlation between fields could achieve much better compression ratios than a schema-agnostic one.

Columnar formats (Apache Parquet, Apache Arrow) already exploit schema knowledge for batch data — they store all values for a column together, enabling much better compression because similar values are adjacent. For API protocols, the equivalent would be batching multiple messages and applying columnar compression across the batch — storing all amount_cents values together, all currency codes together — before transmission.

Some specialized systems do this already. No general-purpose API protocol does. The latency cost of batching (you must accumulate messages before compressing) makes this tradeoff only worthwhile for high-throughput pipes where bandwidth is the bottleneck, but at sufficient scale, the bandwidth savings could justify the latency cost.

What the Constraints Tell Us

Looking at the direction of these open problems, a few common themes emerge.

The assumption of reliable, low-latency connectivity will continue to weaken. Mobile, edge, IoT — all push toward protocol designs that handle intermittent connectivity, disconnected operation, and network transitions as first-class concerns rather than edge cases. Protocols designed around the assumption of a stable data center connection will increasingly be wrong choices for a growing fraction of deployments.

The distinction between data and computation will blur. Reactive query subscriptions, local-first sync, and distributed compute all challenge the “dumb pipe with smart endpoints” model. Future protocols may need to carry not just data but queries, continuations, and computation artifacts.

Security and authorization will need to be more tightly integrated with the protocol layer. Cap’n Proto’s capability model is a hint at what this might look like. As systems become more distributed and peer-to-peer, the assumption that the network boundary is the authorization boundary (which underlies most current API security models) breaks down. The authorization model needs to be carried in the protocol, not added as an application layer on top.

These are constraints, not designs. The protocols that emerge from them do not exist yet. But working engineers who understand what today’s protocols are missing are the ones who will be positioned to design the next generation — or at minimum, to recognize and adopt it when someone else does.

How to Choose — A Decision Framework

Every chapter in this book has ended with some version of “it depends.” This chapter does not try to replace judgment with a flowchart. What it does do is make the dependencies explicit — describe what “it depends on” actually means in terms you can evaluate for your specific system.

The framework here is not a decision tree. Decision trees for protocol selection are lies: they look authoritative, they collapse the actual complexity, and they give you someone else’s opinion expressed as objective logic. What follows instead is a set of dimensions along which protocols differ, with honest characterizations of where each one fits. Apply these dimensions to your specific context.

Dimension 1: Communication Pattern

This is usually the most decisive dimension. What is the fundamental shape of the conversation between your components?

Request-response (client initiates, server answers once): This is the pattern that REST, gRPC unary, Thrift, Twirp, and JSON-RPC are all designed for. It matches: user actions, data retrieval, commands where you need a synchronous acknowledgment. It mismatches: any pattern where you need the server to push data, where multiple responses are expected from one request, or where the client needs to stream data to the server.

Server push (server initiates, client receives): SSE is the simplest implementation. WebSocket server-to-client streaming works. gRPC server streaming works. MQTT with a retained message or subscription works. AMQP consumer works. The question is whether you also need the client to send concurrent data — if not, SSE is often the simplest and most robust choice.

Full duplex (both sides send, both sides receive simultaneously): WebSockets, gRPC bidirectional streaming, NATS, ZeroMQ with DEALER/ROUTER. The trigger for this is simultaneous independent messaging — not just request-response in both directions, but cases where A sends to B while B is simultaneously sending to A without coordination.

Pub/sub (many publishers, many subscribers, decoupled): MQTT, AMQP topic exchanges, NATS, Kafka. The defining characteristic is that publishers do not know who their subscribers are. This decoupling enables patterns that request-response cannot: broadcasting to many consumers, dynamic consumer topology, subscribers that come and go independently of publishers.

Pipeline/stream processing (ordered, persistent, replayable): Kafka, NATS JetStream, AMQP durable queues. The characteristic is that messages are records to be processed, not ephemeral events to be delivered. The consumer’s offset into the stream matters; replay from a specific point is needed.

Getting this wrong is expensive. Trying to implement pub/sub over REST (polling) or trying to do complex routing with WebSockets (rolling your own AMQP) both result in large amounts of application code that re-implements protocol features.

Dimension 2: Delivery Guarantees

What does your system need, and what can it tolerate?

At-most-once (fire and forget): Acceptable when: the cost of losing a message is low and the cost of duplicate processing is high, or when throughput requirements make acknowledgment overhead unacceptable. Appropriate for: high-frequency telemetry, real-time game state, logs where missing some entries is acceptable. HTTP without retries is at-most-once. MQTT QoS 0. ZeroMQ’s default behavior.

At-least-once: Acceptable when: processing is idempotent (the same message processed twice has the same result as processing it once). Appropriate for: most event-driven architectures where messages represent distinct events that should be processed but duplicate processing is harmless. MQTT QoS 1, AMQP with acknowledgment, Kafka with at-least-once semantics, most message queuing systems.

Exactly-once: Needed when: processing is not idempotent and duplicates cause real problems (sending an email twice, charging a payment twice). Hard to achieve in distributed systems — requires coordination between the sender, the transport, and the receiver. MQTT QoS 2, AMQP transactions, Kafka exactly-once semantics (with caveats), idempotency keys in REST APIs. Every “exactly-once” guarantee comes with caveats about what “exactly once” means in the face of specific failure modes.

A practical note: exactly-once is more expensive than at-least-once, which is more expensive than at-most-once — in latency, in overhead, and in implementation complexity. Many systems that claim to need exactly-once delivery actually need at-least-once with idempotent processing, which is simpler and cheaper.

Dimension 3: Latency and Throughput Requirements

These are related but distinct, and they point toward different protocol choices.

Latency-critical (sub-millisecond, microsecond budgets): You are in the territory of ZeroMQ, custom binary protocols, or Cap’n Proto. HTTP is not a contender. The broker adds a hop; the hop adds latency. Any protocol with a broker is likely disqualified. The question is how much engineering you are willing to invest in eliminating overhead.

High-throughput (millions of messages/second, bandwidth-constrained): Binary serialization (Protocol Buffers, Cap’n Proto, FlatBuffers) matters here. JSON’s parse cost is a real fraction of CPU at this scale. Connection pooling, batching, and compression become important. Kafka’s design is optimized for this — sequential disk writes, batch transmission, efficient consumer group management.

Moderate throughput, normal latency (most applications): You have more freedom. The overhead of JSON is not your bottleneck; REST or gRPC both work. Choose based on other dimensions.

Low-throughput, constrained bandwidth (IoT, satellite links): MQTT’s compact framing and session management were designed specifically for this. The per-message overhead of any HTTP-based protocol is a real cost at this bandwidth constraint.

Dimension 4: Client Diversity

What clients will consume this API, and what are their constraints?

Public API consumed by arbitrary clients: REST/HTTP is the strong default. The tooling, documentation ecosystem, and universal support for HTTP make any other choice a barrier for external developers. Binary protocols require client library support in every language a consumer might use; HTTP is already supported everywhere.

Browser clients: HTTP/REST or gRPC-Web (with proxy overhead) or WebSockets/SSE for push. No client-streaming gRPC. No raw TCP. No MQTT without an HTTP bridge. The browser sandbox is the constraint.

Mobile clients (iOS, Android): Most networking options are available (HTTP, WebSockets, gRPC). The relevant constraint is battery: maintaining a persistent connection for push consumes more power than polling. MQTT’s LWT and persistent session model was specifically designed for mobile. If battery life matters, the connection management model of your chosen protocol matters.

IoT / embedded devices: MQTT dominates for a reason. The compact framing, low connection overhead, and built-in network-failure handling are not niceties — they determine whether the application is feasible on constrained hardware.

Internal service-to-service: You control both ends, so any protocol is viable. The question shifts to operational concerns and team capability. gRPC is a strong default for polyglot microservices. AMQP is a strong default for work queues. NATS is a reasonable alternative for teams that want lower operational overhead.

Dimension 5: Failure Handling and Durability

What happens when things go wrong, and what does your system require?

Loss of a component is tolerable: Stateless HTTP fits. If the server that handled your request goes down, you retry against another server. This works because the server held no conversational state.

Loss of a message is catastrophic: You need durability — AMQP durable queues with persistent messages, Kafka’s replicated logs, or application-level storage with idempotency keys. HTTP is not durable: a 200 response means the server received and processed the request, but if the server crashes immediately after accepting the request and before writing to storage, the work is lost.

Partial connectivity is expected (mobile, IoT): Session persistence across disconnections is valuable. MQTT’s clean/persistent session, WebSocket reconnection with state replay, or application-level sequence numbers with gap detection. Protocols without reconnection semantics require application code to handle this.

Fan-out to many consumers: AMQP fanout exchanges, MQTT subscriptions, NATS subject matching. Naive HTTP requires N requests to N consumers; a pub/sub system requires one publish and the broker (or protocol library) handles fan-out.

Dimension 6: Operational Complexity

This dimension is often underweighted because it is not visible during the design phase. It becomes very visible at 2 AM when something breaks.

No infrastructure beyond the application: REST over HTTP, WebSockets/SSE, gRPC, ZeroMQ, Twirp. The protocol lives in the application; there is no broker to deploy, monitor, or maintain.

Single broker with straightforward operation: Mosquitto for MQTT. A single RabbitMQ instance. A single NATS server. These are manageable for small teams. They add a failure domain — the broker can fail — but the operational surface is understandable.

Clustered broker with operational depth: Kafka, EMQ X, HiveMQ cluster, RabbitMQ cluster. These require deeper operational knowledge: cluster configuration, partition management, replication factors, consumer group rebalancing, monitoring of broker internals. The capabilities they provide are worth the cost at scale; at small scale, the cost exceeds the benefit.

Service mesh and proxy infrastructure: gRPC at scale requires L7 load balancing — Envoy, Istio, or similar. This is not just “deploy a proxy”; it is adopting an infrastructure paradigm that touches every service. The benefits (observability, traffic management, mTLS) are real at large scale. For small deployments, the overhead is not justified.

Be honest about your team’s operational capacity. A protocol that is theoretically optimal but operationally complex beyond your team’s ability to maintain is a liability, not an asset. The right answer for a two-person startup is different from the right answer for a platform team at a large organization.

Dimension 7: Schema and Contract

How important is the formal definition of your API’s interface?

Schema-first with code generation: gRPC + Protocol Buffers, Thrift, Cap’n Proto. The schema is the source of truth; clients and servers are generated from it. Type mismatches are caught at compile time. Schema evolution is governed by the schema format’s rules.

Schema-first with validation only: OpenAPI/Swagger for REST, JSON Schema for WebSocket payloads. The schema defines the contract but clients are not generated from it (or if they are, the generated clients are less tightly bound than with protobuf-generated code). Drift between schema and implementation is possible.

Schemaless: Raw WebSockets, ZeroMQ without a payload schema, MQTT without a payload schema. The contract is in the documentation. Runtime errors, not compile-time errors, reveal mismatches. Appropriate when flexibility outweighs safety (rapidly evolving APIs, heterogeneous clients with different capabilities).

For internal services, schema-first with code generation is almost always worth the investment. The class of bugs it eliminates — mismatched field names, type coercions, missing required fields — is large enough that the upfront cost pays back quickly. For external APIs where clients are outside your control, the tradeoff depends on how many clients you have and how much you want to invest in supporting them.

Putting It Together

Rather than a decision tree, here are characterizations of common situations:

Web application with a REST backend and a need for real-time updates: Add SSE for server push. If you also need client-to-server streaming or complex bidirectional interaction, upgrade to WebSockets on those specific endpoints. Do not replace your REST API; add streaming alongside it.

Microservices communicating internally in a polyglot environment: gRPC is the default. The protobuf schema contract and generated clients eliminate entire categories of integration bugs. The operational overhead of L7 load balancing is real, but it is manageable with Kubernetes and a service mesh.

IoT device telemetry at scale: MQTT. The hardware constraints, connection model, and QoS semantics were all designed for exactly this.

Work queue for background processing with reliability guarantees: AMQP (RabbitMQ). Dead letter exchanges, acknowledgment semantics, and queue durability handle the reliability requirements. If you need replay and high throughput, evaluate Kafka.

High-frequency inter-process messaging on the same host: ZeroMQ with inproc:// or ipc:// transport, or a shared-memory queue. The network stack adds latency that you do not need.

Event streaming with downstream consumers at their own pace: Kafka or NATS JetStream. The persistent log, offset tracking, and consumer groups are what distinguish this from pub/sub.

Public API for third-party developers: REST over HTTP with OpenAPI documentation. Devex wins. The marginal performance improvement of anything else is not worth the friction you are adding for your consumers.

The Decision You Are Actually Making

When you choose a protocol, you are not just choosing a wire format. You are choosing:

What failure modes you are responsible for handling (vs. what the protocol handles)
What your monitoring and debugging story looks like
What your team needs to know to operate the system
What your clients need to know to consume your service
What happens when your traffic grows by 10x

The best protocol for your system is the one that matches these constraints, not the one that scores highest on abstract technical criteria. A system built on the right protocol for its constraints is easier to reason about, easier to operate, and more reliable than one built on a technically superior protocol that does not match the team’s capabilities or the system’s operational context.

The goal of this book has been to give you enough information to make this choice deliberately — to be the engineer who chose REST because REST was right, not because REST was the default; who chose gRPC because the polyglot contract enforcement justified the operational complexity, not because gRPC was what the last job used; who chose MQTT because the device constraints demanded it, not because “IoT uses MQTT.”

That deliberate choice — made with full information, owned by the engineers who made it — is the difference between a system you understand and a system you inherited.

Acknowledgments

Thanks to Georgiy Treyvus, CloudStreet Product Manager, whose idea started this book.

Keyboard shortcuts

HTTP Alternatives for APIs