Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

Why Everyone Just Copies What Kafka Does

There’s a pattern in distributed systems engineering that goes like this: a team faces a problem involving reliable data movement between services. They consider their options. Someone mentions Kafka. The room divides into two camps: those who say “let’s just use Kafka” and those who say “let’s build something simpler — like Kafka, but lighter.” Both camps end up implementing some variant of Kafka’s design. One just does it with more steps.

This chapter is about why that happens. Not because Kafka is perfect — it isn’t — but because Kafka landed on a set of design decisions that turn out to be very hard to improve upon for a surprisingly wide range of problems. Understanding what those decisions are and why they work tells us something important about the relationship between theoretical consensus and practical systems.

The Log-Centric Worldview

Kafka’s foundational insight, articulated by Jay Kreps in his 2013 blog post “The Log: What every software engineer should know about real-time data’s unifying abstraction,” is that an append-only log is the most natural primitive for distributed data infrastructure.

This isn’t a new idea — databases have used write-ahead logs since the 1970s, and Lamport’s state machine replication framework is fundamentally about replicating a log of commands. What Kafka did was elevate the log from an implementation detail to the primary abstraction. In Kafka, the log isn’t something that happens behind the scenes to support replication — the log IS the product. Producers append to it. Consumers read from it. That’s the API.

This has several consequences that cascade through the entire system design:

Decoupling of producers and consumers. Because the log is persistent, producers don’t need to know about consumers and consumers don’t need to coordinate with producers. A producer writes a record and moves on. A consumer reads at its own pace. If a consumer falls behind, the log retains the data (up to the retention limit). If a consumer crashes, it resumes from where it left off. This is publish-subscribe done right, and it eliminates an enormous class of coordination problems.

Replay as a first-class operation. Because the log is immutable and indexed by offset, any consumer can re-read from any point. This turns what would be an exceptional operation in a traditional message queue (re-processing old messages) into a routine operation. Need to reprocess yesterday’s events because you deployed a bug? Reset the consumer offset. Need to bootstrap a new service with historical data? Read from the beginning of the topic. This capability is so valuable that teams build entire architectures around it (event sourcing, CQRS, the “streaming ETL” pattern).

Natural ordering guarantee. Within a partition, records have a total order defined by their offset. This is exactly the guarantee that most applications need — events for a given entity (user, account, device) are ordered, while events for different entities can be processed independently. The partition-level ordering maps cleanly to the application-level requirement without over-ordering (total order across all events) or under-ordering (no ordering at all).

Offset-based consumption model. Consumers track their position in the log via an offset (a simple integer). This is dramatically simpler than the acknowledgment models in traditional message queues, where individual message acknowledgment creates the need for complex state tracking, redelivery logic, and dead letter queues. In Kafka, advancing your offset means “I have processed everything up to here.” It’s idempotent, it’s resumable, and it compresses the consumer’s state to a single number per partition.

These properties aren’t unique to Kafka — any append-only log has them. But Kafka was the system that packaged them into an operational, scalable, production-ready product and demonstrated that you could build a company’s entire data infrastructure around them. That packaging matters more than the theory.

What Kafka Actually Does Well

Let’s be specific about where Kafka excels, because “Kafka is good” is too vague to be useful.

Throughput

Kafka achieves high throughput through several design decisions that prioritize sequential I/O over everything else:

  • Append-only writes. All writes are sequential appends. No random I/O for inserts, no B-tree rebalancing, no compaction (unless you use log compaction, which is a separate beast). Sequential disk I/O on modern hardware can sustain 500MB/sec+ on a single disk, and Kafka exploits this fully.

  • Zero-copy transfers. When a consumer reads data, Kafka uses the sendfile system call (zero-copy) to transfer data directly from the page cache to the network socket without copying through user space. This eliminates one of the biggest CPU bottlenecks in data transfer.

  • Batching everywhere. Producers batch records before sending. The broker batches records before writing to disk. Consumers fetch batches of records. Compression is applied at the batch level. Every layer is optimized for amortizing overhead across many records.

  • Page cache reliance. Kafka deliberately uses the OS page cache rather than managing its own buffer pool. This means that recent data (which is what most consumers are reading) is served from RAM without Kafka doing any caching logic. It also means Kafka’s heap usage is low, reducing GC pressure — a non-trivial concern for a long-running JVM process.

The result is that a single Kafka broker can sustain hundreds of thousands of messages per second, and a well-configured cluster can handle millions. These are real numbers, not microbenchmark fantasies.

Consumer Groups

Kafka’s consumer group mechanism is one of its most copied features, and for good reason. A consumer group provides:

  • Automatic partition assignment. Partitions are distributed among consumers in the group. If you have 12 partitions and 3 consumers, each consumer gets 4 partitions. Add a 4th consumer and the partitions rebalance to 3 each.

  • Automatic failover. If a consumer crashes, its partitions are reassigned to surviving consumers. This provides fault tolerance at the consumption layer without the application implementing any failure detection.

  • Parallel consumption with ordering. Each partition is consumed by exactly one consumer in the group, maintaining per-partition ordering while allowing parallel processing across partitions.

The consumer group protocol has gone through several iterations (the original ZooKeeper-based coordination, the broker-based group coordinator, cooperative rebalancing, static membership, server-side assignors), each addressing real operational pain points. The evolution reflects the reality that getting distributed consumption right is genuinely hard, and Kafka has been iterating on it for over a decade.

Retention and Replay

Kafka’s retention model — keep all messages for a configurable time or size limit, regardless of whether they’ve been consumed — was radical when Kafka was introduced. Traditional message queues deleted messages after acknowledgment, treating the queue as a transient buffer. Kafka treats the log as a database of events.

This enables patterns that are impossible or awkward with traditional queues:

  • Multiple independent consumers. Each consumer group has its own offset. Ten different services can independently consume the same topic at their own pace without interfering with each other.

  • Backfilling new consumers. A new service that needs historical data can consume from the beginning of the topic, processing days, weeks, or months of events to build up its state.

  • Operational replay. Deployed a bug that corrupted your downstream database? Reset the consumer offset and reprocess. This has saved more engineering teams than any monitoring system.

The ISR Model’s Quiet Influence

Chapter 16 covered Kafka’s ISR (In-Sync Replica) model in detail. Here, we’ll focus on its influence beyond Kafka.

The ISR model’s key insight is that you don’t need a fixed quorum. Instead, you maintain a set of replicas that are “in sync” with the leader, and you only require acknowledgment from this dynamic set. Replicas that fall behind are removed from the ISR and re-added when they catch up.

This is not consensus in the formal sense. It doesn’t provide the same guarantees as Paxos or Raft. Specifically:

  • If the entire ISR is lost simultaneously (all in-sync replicas crash before their data is flushed), data is lost
  • The ISR can shrink to one (the leader), at which point there’s no replication at all
  • The decision about whether to allow an “unclean” leader election (promoting a replica that was behind) is a configuration choice, not a protocol guarantee

But the ISR model has properties that make it attractive for systems that could have used classical consensus but chose not to:

Dynamic membership without consensus. Adding or removing replicas from the ISR doesn’t require a membership-change protocol. The controller tracks ISR membership and updates it based on replication lag. This is operationally simpler than the joint-consensus or single-server-change approaches used by Raft.

Graceful degradation. When a replica falls behind (due to load, slow disk, network issues), the ISR shrinks and the system continues. There’s no complex failure detection — just “is this replica keeping up?” If the ISR shrinks too much (below min.insync.replicas), writes are rejected, which is a clear and understandable failure mode.

Tunable consistency. With acks=all, you get durability to all ISR members. With acks=1, you get durability only to the leader (lower latency, higher risk). With acks=0, you get fire-and-forget. This per-request tunability is more flexible than most consensus protocols, which provide a single consistency level.

The influence of this model shows up in systems you might not expect. MongoDB’s replica sets use a similar write-concern model (write to primary, wait for replication to secondaries, configurable acknowledgment level). Amazon Aurora uses a quorum-based model but with a storage layer that independently manages replica health. The pattern of “dynamic replica set with tunable durability” has become the default approach for systems that need replication but don’t want the full weight of formal consensus.

Kafka’s Approach to Exactly-Once Semantics

For years, Kafka’s answer to “does Kafka support exactly-once?” was “no, but at-least-once is good enough for most use cases.” This was honest but frustrating.

Then, in 2017, KIP-98 introduced exactly-once semantics (EOS) for Kafka. The implementation required:

  1. Idempotent producers. Each producer is assigned a producer ID, and each message within a producer session is assigned a sequence number. The broker deduplicates based on (producer ID, partition, sequence number). This eliminates duplicates from producer retries.

  2. Transactional writes. For writes spanning multiple partitions (e.g., a stream processing job that reads from partition A and writes to partition B), Kafka implements a two-phase commit protocol using a transaction coordinator. The producer begins a transaction, writes to multiple partitions, and either commits or aborts atomically.

  3. Consumer-side offset management. Consumer offsets are committed as part of the transaction, ensuring that “read input, process, write output, advance offset” is atomic.

This took years to implement, stabilize, and optimize. The initial EOS release had performance overhead that made many users disable it. Subsequent releases (particularly the Kafka Streams “exactly-once v2” in KIP-447) reduced the overhead to the point where EOS is practical for most workloads.

The lesson here is instructive: exactly-once semantics are not a property of the consensus/replication protocol — they’re a property of the entire system, including producers, consumers, and the transaction coordinator. Anyone who tells you their consensus protocol provides exactly-once semantics is either confused about what they mean or has quietly built a transaction coordinator on top of their consensus protocol.

The KRaft Migration: Kafka Confronts Consensus

For most of its history, Kafka outsourced its own consensus needs to ZooKeeper. The broker cluster used ZooKeeper for:

  • Controller election (which broker is the controller)
  • Broker registration (which brokers are alive)
  • Topic and partition metadata
  • ISR tracking
  • ACL storage

This worked but created a dependency that was operationally painful. Running Kafka meant running two distributed systems — Kafka itself and a ZooKeeper cluster — each with its own failure modes, monitoring requirements, and upgrade procedures. ZooKeeper became the weak link not because it’s a bad system (it’s not), but because most Kafka operators weren’t ZooKeeper experts, and a struggling ZooKeeper cluster manifests as mysterious Kafka problems.

KRaft (Kafka Raft) is Kafka’s project to replace ZooKeeper with a built-in Raft-based metadata quorum. The key details:

  • A set of broker nodes (or dedicated controller nodes) form a Raft quorum for metadata management
  • The metadata is stored as an event log (naturally), replicating Kafka’s own log-centric philosophy
  • The Raft implementation is specifically optimized for Kafka’s needs (batched writes, snapshot-based state transfer)

The KRaft migration tells us several things about Kafka’s relationship with consensus:

Kafka always needed consensus — it just outsourced it. The ISR model handles data replication, but Kafka still needs consensus for metadata: who’s the controller, what’s the partition assignment, what’s the current ISR. This metadata must be consistent across all brokers. ZooKeeper provided that consensus. KRaft provides it in-process.

Raft was the obvious choice. When the Kafka team needed to implement consensus, they chose Raft — not because it’s the theoretically optimal protocol, but because it’s the most well-understood and implementable one. Even a team as experienced as the Kafka committers, with deep knowledge of consensus theory, chose the pragmatic option.

The migration is painful. Replacing a running system’s consensus layer while maintaining availability is, to use a technical term, extremely hard. The KRaft migration has been in progress since 2020, with ZooKeeper finally deprecated in 3.5 (2023) and full ZooKeeper removal targeted for 4.0. A multi-year migration for a foundational component is not unusual — it’s expected. This should calibrate your expectations for “we’ll just swap out the consensus layer later.”

A Comparison of Replication Models

To understand why Kafka’s ISR model has been so influential, it helps to see it side by side with classical consensus approaches as applied to log replication.

PropertyRaft-style ReplicationKafka ISRFlexible Paxos
Quorum definitionFixed majority (n/2 + 1)Dynamic (all in-sync replicas)Configurable (Phase 1 and Phase 2 quorums)
Quorum membershipStatic (requires reconfiguration to change)Dynamic (ISR shrinks/grows automatically)Static (requires reconfiguration)
Minimum write quorumMajority of clustermin.insync.replicas (configurable)Phase 2 quorum size (configurable)
Failure detectionHeartbeat timeout → leader electionReplication lag → ISR removalHeartbeat timeout → Phase 1
Recovery after failureFollower replays from leader’s logBroker catches up and rejoins ISRFollower replays from leader’s log
Availability during failuresAvailable if majority aliveAvailable if ISR >= min.insync.replicasDepends on quorum configuration
Consistency guaranteeLinearizableLinearizable (with acks=all and min.insync.replicas >= 2)Linearizable
Cost of adding a replicaReconfiguration protocolJust start replicating (auto-joins ISR when caught up)Reconfiguration protocol
Operational overheadModerate (fixed membership, explicit reconfiguration)Low (ISR is self-managing)High (must understand quorum intersection requirements)

The table reveals why ISR is attractive for operators: it’s self-managing. A slow replica drops out of the ISR automatically. A recovered replica rejoins automatically. You don’t need to run a reconfiguration procedure — the system adapts. For a system like Kafka, where brokers routinely experience load spikes, garbage collection pauses, and rolling restarts, this adaptability is operationally invaluable.

The formal consensus community would note that ISR’s “automatic” behavior is precisely what makes it less safe — the ISR can shrink to one node without any operator approval, and if that one node fails, data is lost. This is true, and it’s why min.insync.replicas exists. But the fact that ISR requires one configuration parameter to be safe, while Raft requires correct implementation of a reconfiguration protocol to be flexible, tells you something about where each approach puts its complexity budget.

Systems That Successfully Copied Kafka

Several systems have taken Kafka’s design principles and adapted them, some more successfully than others.

Apache Pulsar

Pulsar separated the compute layer (brokers) from the storage layer (BookKeeper). This is arguably an improvement on Kafka’s architecture, where brokers are both compute and storage nodes. Pulsar’s approach allows independent scaling of serving capacity and storage capacity.

Pulsar uses BookKeeper’s quorum-write protocol for replication, which is similar to the ISR model but with explicit write quorums and ack quorums (similar in spirit to Flexible Paxos). A message is written to W replicas and considered committed when A replicas acknowledge (where A <= W).

Where Pulsar differs from Kafka:

FeatureKafkaPulsar
Storage architectureBroker = storageSeparated (broker + BookKeeper)
ReplicationISR (dynamic)Quorum writes (configured W and A)
Multi-tenancyLimited (quotas)First-class (namespaces, quotas, isolation)
Geo-replicationMirrorMaker (async)Built-in (async)
Consumer modelPull-basedPull and push
Message acknowledgmentOffset-basedIndividual or cumulative

Pulsar demonstrates that you can take the log-centric model and make different implementation choices while preserving the core benefits. The question most teams face isn’t “is Pulsar better than Kafka?” but “is Pulsar enough better to justify the smaller community and ecosystem?”

Redpanda

Redpanda took a different approach: implement Kafka’s protocol exactly (wire-compatible) but with a C++ implementation using the Seastar framework, eliminating the JVM and its garbage collection overhead.

Redpanda uses Raft for both data replication and metadata management (no ZooKeeper dependency from day one — anticipating Kafka’s own direction). Each partition is backed by a Raft group, which provides stronger guarantees than Kafka’s ISR model but at the cost of the flexibility that ISR provides.

The Redpanda story validates Kafka’s API and operational model while suggesting that the ISR model itself might not be the only viable replication approach. You can use Raft for per-partition replication and get Kafka-compatible behavior with stronger consistency guarantees. The tradeoff is that Raft’s fixed quorum is less flexible than ISR’s dynamic membership — a slow replica in Raft still participates in the quorum, while Kafka simply removes it from the ISR.

Amazon Kinesis

Kinesis is Amazon’s managed streaming service, clearly inspired by Kafka’s model (topics are “streams,” partitions are “shards,” consumers use checkpoints analogous to offsets). The replication and consensus details are hidden behind the managed service boundary, which is both the advantage (you don’t have to care) and the limitation (you can’t tune it).

Kinesis validated that the log-centric model works as a managed service and that most users don’t need or want to think about the replication protocol. They want the abstraction: append records, read records, partition for parallelism, retain for replay.

Systems That Probably Shouldn’t Have Copied Kafka

Not every team that says “we’re building something like Kafka” should be building something like Kafka.

The “Lightweight Kafka” Trap

The conversation usually starts like this:

“Kafka is too heavy for our use case. We just need a simple message queue with persistence and ordering. Let’s build something lightweight.”

Two years later, the team has built:

  • A log-structured storage engine (because they need persistence)
  • A replication protocol (because they need fault tolerance)
  • A consumer group coordinator (because they need parallel consumption)
  • A partition assignment algorithm (because they need scalability)
  • A compaction mechanism (because the disk isn’t infinite)
  • An offset management system (because consumers need to track their position)
  • A metrics and monitoring layer (because they need to operate it)
  • A client library (because applications need to talk to it)
  • A wire protocol (because the client library needs a wire protocol)

They’ve built Kafka. Except it’s less tested, less documented, less understood, and maintained by a team of three instead of a community of thousands.

The “lightweight Kafka” trap springs from a misunderstanding of where Kafka’s complexity comes from. It’s not the JVM (though that adds operational overhead). It’s not ZooKeeper (though that also adds overhead). It’s the fundamental problem space. Any system that provides durable, ordered, fault-tolerant, scalable message delivery will converge on a similar set of mechanisms, and those mechanisms are inherently complex.

Internal Event Buses

Many organizations build internal “event bus” or “event backbone” systems that are spiritually Kafka-like but use a different replication strategy. Sometimes it’s a custom protocol over Redis Streams. Sometimes it’s a PostgreSQL-backed queue with logical replication. Sometimes it’s a hand-rolled TCP server with file-based persistence.

These systems work fine at small scale and become increasingly painful as scale grows. The usual failure mode is that the team discovers, one production incident at a time, all the problems that Kafka has already solved:

  1. What happens when a consumer is slow? (Backpressure and consumer lag monitoring)
  2. What happens when the disk fills up? (Retention policies and segment deletion)
  3. What happens when you need to add a partition? (Partition reassignment)
  4. What happens when the leader dies? (Leader election and ISR management)
  5. What happens when a message is produced but the ack is lost? (Idempotent producers)

Each of these problems takes weeks to months to solve correctly, and the solutions look increasingly like Kafka. By the time you’ve solved them all, you’ve built Kafka with more bugs and fewer features.

When “Not Kafka” Is Actually Right

To be fair, there are legitimate cases where something other than Kafka is appropriate:

Embedded systems / edge devices. Kafka is a server-side system. If you need message ordering on a device with 256MB of RAM, you need something different (an embedded Raft log, a local SQLite-based queue, etc.).

Very small scale with simple requirements. If you have one producer, one consumer, and need a persistent queue, PostgreSQL with SKIP LOCKED is simpler and sufficient. Not everything needs to be a distributed system.

Extreme low latency (< 1ms). Kafka’s batching-oriented design trades latency for throughput. If you need sub-millisecond end-to-end latency, you need a system designed for that (Aeron, custom shared-memory queues, etc.). These systems typically sacrifice durability and fault tolerance for speed.

Total ordering across all events. If you genuinely need a single totally-ordered log (not per-partition ordering), Kafka’s partitioned model doesn’t help. You need a single-partition topic (which is a single-leader bottleneck) or a different system entirely.

The Lessons

Kafka’s outsized influence on distributed systems design carries several lessons that extend beyond Kafka itself.

Lesson 1: “Good Enough” Consensus Often Beats “Correct” Consensus

Kafka’s ISR model is not consensus in the formal, provably-correct sense. It doesn’t provide the same guarantees as Raft or Paxos. It has corner cases (unclean leader election, ISR shrinking to one) where data loss is possible.

And yet, Kafka handles more data reliably than probably any other distributed system in existence. How?

Because the remaining failure modes — correlated failures that take out the entire ISR, unclean leader elections that are now disabled by default — are rare enough that the practical reliability is extremely high. And the system provides operational controls (monitoring ISR size, alerting on under-replicated partitions, configuring min.insync.replicas) that let operators manage the residual risk.

This is a different philosophy from formal consensus. Formal consensus says “prove that no execution can violate safety.” Kafka’s approach says “make the unsafe executions rare enough that operational monitoring can catch them.” It’s not as theoretically satisfying, but for the vast majority of use cases, the difference is immaterial.

Lesson 2: The Interface Matters More Than the Protocol

Kafka’s success isn’t primarily about the ISR protocol — it’s about the log abstraction, the consumer group model, the offset-based consumption, the retention and replay capabilities. These are interface-level decisions, not protocol-level decisions. You could implement the Kafka API on top of Raft (Redpanda does) or on top of a different quorum protocol (Pulsar/BookKeeper) and get a system that’s functionally equivalent from the application’s perspective.

This suggests that most of the energy spent debating consensus protocols would be better spent designing the right abstraction for the application. The protocol is an implementation detail. The abstraction is the product.

Lesson 3: Operational Simplicity Wins

ZooKeeper is a well-designed, formally verified, battle-tested consensus system. Kafka used it for a decade and then spent years replacing it. Why? Because operational simplicity matters more than protocol elegance.

Running two distributed systems (Kafka + ZooKeeper) is harder than running one (Kafka with KRaft). The operational complexity isn’t additive — it’s multiplicative, because failure modes can combine in unexpected ways. A ZooKeeper session timeout causes a Kafka controller election, which triggers partition reassignment, which causes consumer rebalancing, which causes a processing delay, which causes backpressure, which causes producer timeouts. Each component is well-designed in isolation; the emergent behavior is chaos.

KRaft doesn’t improve Kafka’s theoretical properties. It improves Kafka’s operational properties. And in production systems, operational properties dominate.

Lesson 4: Engineering Culture Matters

Why does engineering culture gravitate toward Kafka’s pragmatic approach over academic elegance?

Because most engineers are evaluated on shipping features, not on proof correctness. A system that works 99.999% of the time, is operationally understandable, has good monitoring, and can be debugged by on-call engineers at 3 AM is more valuable than a system that works 100% of the time (in theory) but requires a PhD to understand when something goes wrong.

This isn’t anti-intellectual — it’s a recognition that production systems are sociotechnical artifacts. They’re operated by humans, and the human factors (understandability, debuggability, operational familiarity) are as important as the algorithmic factors (message complexity, latency bounds, safety proofs).

Kafka’s documentation explains what happens when a broker fails. Paxos’s documentation explains what happens in an abstract model of asynchronous message passing. Both are valuable, but only one of them helps you at 3 AM.

Lesson 5: Sometimes the Industry Is Right

The academic consensus community has produced protocols that are theoretically superior to what most production systems use. EPaxos has better throughput than Raft for multi-leader workloads. Flexible Paxos provides stronger configurability. Various BFT protocols offer safety guarantees that CFT protocols can’t match.

And yet, the industry gravitates toward Kafka-style ISR, Raft, and “just use a single leader.” Is the industry wrong?

Sometimes, yes. There are genuinely cases where a team uses Raft when EPaxos would be better, or uses eventual consistency when they actually need strong consistency. But more often, the industry is right — not because the simpler protocols are always theoretically superior, but because the total cost of ownership (implementation effort + operational burden + debugging difficulty + hiring difficulty) is lower.

The team that builds on Raft can hire engineers who’ve used Raft. The team that builds on EPaxos can hire… the small number of engineers who’ve read the EPaxos paper and the even smaller number who’ve implemented it. For most organizations, the pool of available expertise is a more binding constraint than the theoretical properties of the protocol.

The Anatomy of “We’re Building Something Like Kafka”

Having discussed why Kafka is influential, let’s dissect what actually happens when a team utters the fateful words “we’re building something like Kafka.”

Phase 1: The Prototype (Week 1-4)

The team builds a single-node append-only log with a simple TCP protocol. Producers connect and append messages. Consumers connect and read by offset. It works beautifully. Performance is excellent (sequential writes to a single node are fast). The team is pleased. “See? Kafka is over-engineered.”

Phase 2: Replication (Month 2-4)

Someone asks what happens when the node dies. The team adds replication. “It’s just sending the same data to another node.” They implement leader-follower replication. It mostly works, except:

  • What happens when the leader dies? They need leader election. They reach for a consensus protocol or an external coordinator.
  • What about messages that the leader accepted but didn’t replicate? They discover the meaning of “committed” vs “accepted” and why the distinction matters.
  • How does a new follower catch up? They implement state transfer. It’s harder than expected, especially while the leader is still accepting writes.

Phase 3: Consumers (Month 4-8)

Adding multiple consumers reveals new problems:

  • How do consumers coordinate who reads which partition? They need a group coordinator.
  • What happens when a consumer crashes mid-processing? They need to distinguish between “read” and “processed” and implement offset commits.
  • What about rebalancing when consumers join or leave? They implement a rebalance protocol. The first version has a stop-the-world pause. The second version has bugs during concurrent rebalances. The third version works but is complex.

Phase 4: Operations (Month 8-12)

The system is in production. New requirements emerge:

  • Retention policies (the disk isn’t infinite)
  • Monitoring (how to detect under-replicated partitions, consumer lag, leader skew)
  • Partition reassignment (a broker is being decommissioned)
  • Rolling upgrades (deploying a new version without downtime)
  • Performance tuning (batch sizes, buffer sizes, compression)

Each of these is a week to a month of work, with production incidents along the way.

Phase 5: Acceptance (Month 12+)

The team realizes they have built 80% of Kafka with 20% of the features, 10% of the testing, and 5% of the documentation. The “lightweight” system is no longer lightweight. The maintenance burden is significant. New team members struggle to understand the custom replication protocol.

Someone suggests migrating to actual Kafka. The migration takes six months.

This cycle has played out at enough companies to be a pattern. It’s not that the team is incompetent — it’s that the problem space is inherently complex, and Kafka’s complexity reflects the problem, not engineering excess.

The Influence on System Design Patterns

Kafka’s influence extends beyond systems that compete with or copy Kafka. It has shaped how the industry thinks about several design patterns.

Event Sourcing and CQRS

The event sourcing pattern (storing state changes as an immutable log of events, rather than storing current state directly) was known before Kafka, but Kafka made it practical. Before Kafka, implementing event sourcing required building your own durable event store. After Kafka, the event store was a commodity.

CQRS (Command Query Responsibility Segregation) — separating read models from write models — becomes natural when your writes go to a Kafka topic and your read models are materialized by consuming from that topic. The topic is the source of truth, and read models are derived views that can be rebuilt by replaying.

This pattern has become so common that it’s sometimes applied where it isn’t needed (not every CRUD application benefits from event sourcing), but where it fits, it’s powerful. And it only became widespread because Kafka provided the infrastructure to support it.

Change Data Capture

CDC (Change Data Capture) — streaming database changes to downstream consumers — was an old idea implemented with triggers, polling, or database-specific log readers. Kafka Connect and Debezium standardized CDC as “read the database’s WAL, publish to Kafka, consume from Kafka.” This pattern is now the default approach for replicating data between systems.

The consensus implication is interesting: the database’s own WAL (which is built on some form of consensus or durable write protocol) is re-published through Kafka’s ISR protocol. The data passes through two replication systems, each with its own durability guarantees. The end-to-end guarantee is the weaker of the two, which is worth understanding but rarely causes problems in practice.

The “Kafka as a Database” Debate

An ongoing debate in the Kafka community is whether Kafka itself can serve as a database. Proponents point to Kafka Streams’ state stores (backed by RocksDB, with changelog topics for replication), ksqlDB (SQL queries over Kafka topics), and log compaction (which retains the latest value per key, making a topic function like a key-value store).

Opponents point out that Kafka lacks transactions across topics (partially addressed by exactly-once semantics), efficient point lookups (Kafka is optimized for sequential reads, not random access), and the query capabilities of a real database.

The truth is somewhere in between. Kafka can serve as the system of record for event-stream data, and Kafka Streams/ksqlDB can provide materialized views over that data. But using Kafka as a general-purpose database (replacing PostgreSQL or MySQL) is a category error. The log abstraction is powerful but not universal.

What Kafka Gets Wrong

Fairness demands that we also discuss where Kafka falls short.

Tail latency. Kafka’s batching-optimized design means tail latency can be unpredictable. A slow consumer or a compaction storm on the broker can cause latency spikes that are difficult to diagnose.

Small message overhead. If your messages are 100 bytes each, Kafka’s per-message overhead (metadata, headers, CRC) is a significant fraction of the total. Kafka is optimized for messages of 1KB-1MB; below that range, the overhead becomes noticeable.

Partition count scaling. Each partition has a leader, a set of replicas, and associated file handles, memory buffers, and controller state. Kafka clusters with hundreds of thousands of partitions experience controller bottlenecks, slow leader elections, and elevated memory usage. Improvements are ongoing, but this remains a practical limitation.

No server-side filtering. Kafka delivers all records in a partition to the consumer; the consumer must filter client-side. If you only care about 1% of records in a partition, you’re reading and discarding 99%. Some competing systems (Pulsar with server-side filtering, various cloud offerings with subscription filters) do better here.

Ordering across partitions. If you need events for different keys to be ordered relative to each other, Kafka can’t help unless they’re in the same partition. There’s no cross-partition ordering primitive.

These aren’t fundamental flaws — they’re design tradeoffs that are correct for Kafka’s target use case (high-throughput event streaming) and incorrect for other use cases (low-latency request-response, small-message-high-volume telemetry, complex event processing).

The Verdict

Everyone copies what Kafka does because what Kafka does is, for a broad class of problems, the right thing to do. The log abstraction is powerful. The consumer group model is practical. The ISR replication model is simple enough to understand and reliable enough to trust. The operational model is well-documented and well-tooled.

If your problem looks like “move data reliably between services with ordering and persistence,” the Kafka model is your starting point. Whether you use Kafka itself, a Kafka-compatible alternative (Redpanda), a Kafka-inspired alternative (Pulsar), or a managed service (Kinesis, Event Hubs), the architectural pattern is the same.

The mistake is thinking that because Kafka’s pattern works for data streaming, it must also work for coordination (use a consensus protocol), transactional processing (use a database), or low-latency RPC (use a proper RPC framework). Kafka’s influence is broad, but it’s not universal. Knowing where the pattern applies and where it doesn’t is the difference between copying Kafka wisely and copying Kafka reflexively.

A Field Guide to Kafka-Influenced Design Decisions

To close this chapter, here’s a quick reference for the design decisions that Kafka popularized and their applicability beyond Kafka:

Design DecisionKafka’s ApproachWhen to Copy ItWhen Not To
Append-only log as primary abstractionAll writes are appends; no in-place updatesEvent streaming, audit logs, CDC, event sourcingOLTP databases, in-place update workloads
Per-partition ordering (not total ordering)Total order within partition; no cross-partition orderingWhen entities can be partitioned independentlyWhen you need global ordering (financial ledgers, serializable transactions)
Consumer-managed offsetsConsumers track their position; server doesn’t track per-consumer stateHigh fan-out (many consumers per topic)Low fan-out with complex acknowledgment needs
Retention-based lifecycleMessages retained by time/size, not by consumptionWhen replay and backfill are importantWhen storage cost is critical and messages are consumed once
Dynamic replica set (ISR)Replicas self-manage based on replication lagWhen operational simplicity matters more than formal guaranteesWhen you need provable consensus properties
Batching at every layerProducers, brokers, and consumers all batchHigh-throughput workloadsUltra-low-latency workloads (< 1ms)
Leader-per-partitionDifferent partitions can have different leadersWhen aggregate throughput matters more than per-key throughputWhen you need a single authority for all operations

These aren’t Kafka-specific inventions — most have precedents in databases, message queues, or academic systems. But Kafka assembled them into a coherent package and proved they work at scale. That packaging is Kafka’s real contribution, and it’s the reason everyone keeps copying it.