Redis Streams
Every few years, a technology that is good at one thing gets ambitious and tries to be good at a second thing. Sometimes this works brilliantly — PostgreSQL adding JSONB, for instance. Sometimes it produces something awkward. Redis Streams falls somewhere in the middle: it is a genuinely useful data structure that solves real problems, but it carries the DNA of an in-memory cache into a domain that traditionally demands durable, disk-backed storage. Understanding where that tension matters — and where it does not — is the key to using Redis Streams well.
Overview
Redis Streams were introduced in Redis 5.0, released in October 2018. The feature was designed and implemented by Salvatore Sanfilippo (antirez), Redis's creator, who described it as "a new data type modeling a log data structure in a more abstract way." If that sounds like he was thinking about Kafka, he was — though he was careful to position Streams as a Redis-native feature, not a Kafka replacement.
The design intent was to add a proper event log data structure to Redis that supported consumer groups, acknowledgement, and persistence. Before Streams, Redis developers cobbled together pub/sub (which has no persistence and no consumer groups), Lists (which have no fan-out and awkward consumer group semantics), and Sorted Sets (which work but are a contortion). Streams gave Redis a first-class log structure, and it is the right tool for a specific category of problems.
Redis is maintained by Redis Ltd. (formerly Redis Labs), which provides the commercial Redis Enterprise product. The open-source Redis project uses a dual-license model (RSALv2 and SSPLv1 since Redis 7.4 in 2024), and the community fork Valkey (maintained by the Linux Foundation) also supports Streams. For the purposes of this chapter, everything applies equally to Redis and Valkey unless noted.
Architecture
The Data Structure
A Redis Stream is, at its core, an append-only log of entries, each identified by a unique ID. The implementation uses a radix tree of macro-nodes, where each macro-node contains a listpack (a compact serialised list of entries). This structure is optimised for two access patterns: appending new entries (fast, O(1) amortised) and reading entries by ID range (fast, O(log N) for the seek plus O(M) for the range scan).
Each stream entry has:
- An ID: By default, a millisecond timestamp plus a sequence number, formatted as
<millisecondsTime>-<sequenceNumber>(e.g.,1699958537443-0). IDs are monotonically increasing and auto-generated by default, though you can provide custom IDs if they are greater than the last entry's ID. The timestamp-based ID is elegant — it means you can seek to a point in time without maintaining a separate index. - A body: A sequence of field-value pairs, like a flat hash. Every entry in a stream can have different fields, though in practice you will want consistency for consumer sanity.
The Command Vocabulary
Redis Streams expose a small, well-designed API:
XADD appends an entry to a stream. Returns the auto-generated ID. Optionally trims the stream (more on this below).
XADD orders * orderId ord-7829 totalAmount 149.99 region us-east-1
The * tells Redis to auto-generate the ID. The response is something like 1699958537443-0.
XLEN returns the number of entries in a stream. O(1). Useful for monitoring.
XRANGE and XREVRANGE read entries by ID range. This is how you read historical data:
XRANGE orders 1699958537443-0 + COUNT 10
XREAD reads new entries from one or more streams, optionally blocking until entries are available. This is the simple consumer model — no consumer groups, just "give me everything since this ID":
XREAD BLOCK 5000 STREAMS orders 1699958537443-0
XREADGROUP is the consumer group read. This is where Streams gets interesting:
XREADGROUP GROUP order-processors worker-1 COUNT 10 BLOCK 5000 STREAMS orders >
The > means "give me new entries that have not been delivered to any consumer in this group." You can also re-read entries that were delivered but not acknowledged by using a specific ID instead of >.
XACK acknowledges processing of one or more entries:
XACK orders order-processors 1699958537443-0
XCLAIM transfers ownership of a pending entry from one consumer to another. This is for failure recovery — if consumer A received a message and then died, consumer B can claim it:
XCLAIM orders order-processors worker-2 3600000 1699958537443-0
The 3600000 is the minimum idle time in milliseconds. XCLAIM will only claim entries that have been idle for at least this long, preventing you from stealing work from a consumer that is merely slow.
XAUTOCLAIM (added in Redis 6.2) automates the claim process by scanning the pending entries list for entries that have exceeded the idle threshold and claiming them in one operation. This is the command you actually want in production — it replaces the manual "scan PEL, filter by idle time, claim individually" loop:
XAUTOCLAIM orders order-processors worker-2 3600000 0-0 COUNT 10
XPENDING inspects the pending entries list — entries that have been delivered but not acknowledged:
XPENDING orders order-processors
This returns the total pending count, the range of pending IDs, and per-consumer counts. Invaluable for debugging consumer health.
XINFO provides metadata about streams, groups, and consumers. Use XINFO STREAM, XINFO GROUPS, and XINFO CONSUMERS for operational visibility.
Consumer Groups and the Pending Entries List
Consumer groups are the mechanism that transforms Redis Streams from a simple log into a workable message broker. A consumer group:
- Maintains a last-delivered ID — the cursor tracking which entries have been dispatched to consumers in this group.
- Maintains a pending entries list (PEL) — a list of entries that have been delivered to consumers but not yet acknowledged.
- Distributes new entries across consumers within the group in a round-robin fashion (roughly — the distribution depends on which consumer calls XREADGROUP first).
The PEL is the crucial data structure. It tracks, for each pending entry: the entry ID, the consumer name, the delivery timestamp, and the delivery count. This enables:
- At-least-once delivery: Entries stay in the PEL until acknowledged. If a consumer crashes, the entries remain pending and can be claimed by another consumer.
- Failure detection: Entries that have been pending for a long time indicate a dead or stuck consumer.
- Redelivery tracking: The delivery count tells you how many times an entry has been delivered, enabling dead-letter logic in your application code (Redis does not have native DLQ support for Streams — you implement it yourself).
Creating a consumer group:
XGROUP CREATE orders order-processors $ MKSTREAM
The $ means "start consuming from new entries only." Use 0 to start from the beginning of the stream.
Memory Management and Trimming
Redis Streams live in memory. This is simultaneously the source of their speed and their primary constraint. An unbounded stream will grow until Redis runs out of memory, at which point Bad Things Happen (eviction or OOM, depending on your configuration).
Trimming strategies:
MAXLEN caps the stream at a maximum number of entries. You can specify it on every XADD:
XADD orders MAXLEN ~ 1000000 * orderId ord-7829 totalAmount 149.99
The ~ is an important detail — it tells Redis to trim approximately to the max length, allowing it to trim only when it can remove an entire macro-node from the radix tree. This is significantly faster than exact trimming and is what you should use in production.
MINID trims entries with IDs less than the specified minimum. This is time-based trimming:
XADD orders MINID ~ 1699872137443 * orderId ord-7829 totalAmount 149.99
This removes entries older than the specified timestamp, which is often a more natural retention policy than a count-based limit.
In production, choose a trimming strategy based on your use case:
- Use
MAXLEN ~when you care about bounding memory usage predictably. - Use
MINID ~when you care about retaining a time window of entries. - Run trimming on every
XADD(with~) or periodically from a maintenance process. Do not forget to trim. Seriously.
Persistence: RDB + AOF
Redis Streams are persisted through the same mechanisms as all Redis data structures:
RDB (Redis Database) snapshots are point-in-time dumps of the entire dataset to disk. They are taken periodically (configurable) and are compact and fast to load. The catch: data written between snapshots is lost on crash. For a message broker, this means messages can be lost. The window of potential loss equals the time since the last snapshot.
AOF (Append-Only File) logs every write operation. With appendfsync always, every write is fsynced to disk before returning to the client. This provides the strongest durability guarantee Redis can offer, at the cost of higher latency (every XADD waits for disk I/O). With appendfsync everysec (the default), you get at most one second of data loss on crash.
The durability reality check: Even with AOF appendfsync always, Redis's durability guarantees are weaker than Kafka's. Kafka replicates writes to multiple brokers before acknowledging them. Redis with AOF writes to a single disk on a single machine. Redis Sentinel and Redis Cluster add replication, but replication is asynchronous by default — the primary acknowledges the write before the replica receives it. You can use WAIT to block until replicas confirm, but this is per-command, not a cluster-wide guarantee.
If the phrase "we can tolerate losing the last second of messages" makes your compliance team nervous, Redis Streams is not your primary event store.
Redis Cluster and Streams
In a Redis Cluster deployment, each stream lives on a single shard (determined by the stream's key hash). This means:
- A single stream's throughput is limited to a single Redis instance's throughput.
- Consumer groups for a stream operate on a single shard.
- You can scale by sharding across multiple stream keys (e.g.,
orders:{region}using hash tags).
Redis Cluster does not shard a single stream across multiple nodes. If you need a single logical stream with throughput beyond one Redis instance, you need to partition at the application level — which is exactly the problem Kafka solves with its partition model.
Strengths
Sub-millisecond latency. Redis Streams inherits Redis's in-memory performance. XADD and XREADGROUP operations complete in microseconds under normal conditions. If your use case demands the lowest possible latency for event production and consumption, Redis Streams is in a class shared only by NATS Core and Chronicle Queue. Kafka's median latency is measured in single-digit milliseconds; Redis operates in single-digit microseconds.
You already have Redis. This is the pragmatic argument, and it is the strongest one. If your infrastructure already includes Redis for caching, session storage, or rate limiting, adding Streams requires zero new infrastructure. No new clusters to provision, no new operational playbooks, no new monitoring dashboards. The marginal cost of adding an event streaming capability to an existing Redis deployment is almost nothing.
Simple, elegant API. The Streams command set is compact and well-designed. You can learn the core commands (XADD, XREADGROUP, XACK, XPENDING) in an hour. Compare this to the Kafka client API surface area or the RabbitMQ concept vocabulary. Redis Streams is refreshingly small.
Consumer groups that work. The consumer group implementation — with the PEL, XCLAIM, XAUTOCLAIM, and per-consumer tracking — is surprisingly complete. It provides at-least-once delivery, failure recovery, and per-consumer monitoring. It is not as feature-rich as Kafka's consumer group protocol (no automatic rebalancing, no cooperative sticky assignment), but for many use cases it is sufficient.
Built-in time-based indexing. The timestamp-based entry IDs mean you can efficiently query "give me all entries from the last five minutes" without maintaining a secondary index. This is a small thing, but it is a nice thing.
Lua scripting integration. You can interact with Streams in Lua scripts that execute atomically on the Redis server. This enables transactional patterns (read from one stream, write to another, atomically) that are difficult to achieve with external brokers.
Weaknesses
Memory-bound. Everything is in memory. A stream with a million entries of 1 KB each consumes roughly 1 GB of RAM (with overhead). At $10–20/GB/month for cloud instances, storing large event histories in Redis is expensive compared to disk-based brokers. Kafka can retain terabytes of events on cheap disk; Redis retains them in the most expensive tier of the storage hierarchy.
Not designed for high-durability event sourcing. The persistence model (RDB snapshots + AOF) provides reasonable durability for cache and session use cases but falls short of what regulated industries and event sourcing patterns require. Asynchronous replication means acknowledged writes can be lost during failover. There is no equivalent of Kafka's acks=all with min.insync.replicas=2.
Limited replay. You can read historical entries with XRANGE, but there is no concept of "reset consumer group offset to yesterday at noon" — you would need to use XREADGROUP with a specific entry ID, and the consumer group semantics around re-reading acknowledged entries are awkward. Kafka's consumer group offset reset is a first-class operation; in Redis Streams, it requires manual PEL management.
Single-threaded execution. Redis processes commands on a single thread (Redis 7's multi-threaded I/O handles network processing, but command execution remains single-threaded). A CPU-intensive Lua script or a large XRANGE scan blocks all other operations on that shard. For a general-purpose cache, this is rarely a problem. For a message broker handling thousands of consumers, it can become one.
No native dead letter queue. When a message cannot be processed, your application code must track delivery counts (available in the PEL) and move entries to a separate dead letter stream manually. This is not hard, but it is one more thing to implement and test, and one more thing that can have bugs.
No built-in schema registry. Stream entries are flat field-value pairs of strings. There is no schema enforcement, no schema evolution, no type system. Your producers and consumers agree on message formats through convention and hope. For small teams with good discipline, this is fine. For large organisations, it is a governance gap.
Consumer group rebalancing is manual. When a consumer in a group dies, its pending entries are not automatically redistributed. You must implement XAUTOCLAIM polling or a manual XCLAIM process. Kafka handles this automatically with its consumer group rebalancing protocol. In Redis, you build it yourself.
No cross-node stream. A stream lives on one Redis node. Throughput is bounded by that node's capacity. Horizontal scaling requires application-level partitioning across multiple stream keys, which moves partitioning complexity from the broker (where Kafka handles it) into your application (where you handle it).
Ideal Use Cases
Lightweight event bus between microservices. When your event throughput is moderate (thousands to low tens of thousands per second), your retention requirements are short (minutes to hours, not days to months), and your infrastructure already includes Redis. This is the sweet spot.
Real-time activity feeds. Social media timelines, notification streams, activity logs where recency matters and historical depth does not. Redis Streams' sub-millisecond latency and built-in trimming make it natural for "what happened in the last N minutes" queries.
Task queues with visibility. When you need a task queue with consumer groups, acknowledgement, and the ability to inspect pending work, Redis Streams is a compelling alternative to dedicated task queue systems (Celery, Sidekiq, Bull). The PEL gives you complete visibility into what is in flight.
Inter-service communication in a monorepo. When services are co-located and you want a simple event bus without the operational overhead of a dedicated broker. Redis Streams lets you add event-driven communication incrementally.
Rate-limited processing pipelines. Using consumer groups with controlled XREADGROUP COUNT, you can build processing pipelines that naturally throttle throughput. This is useful for feeding rate-limited APIs, managing database write pressure, or smoothing bursty workloads.
When NOT to Use It
Long-term event storage. If you need to retain events for weeks, months, or years, Redis Streams is the wrong tool. The memory cost is prohibitive, and the durability guarantees are insufficient.
Regulated environments requiring guaranteed delivery. If losing a message has compliance implications (financial transactions, healthcare records, audit logs), Redis Streams' persistence model is not strong enough. Use Kafka with acks=all, or a database-backed solution.
High-throughput event sourcing. If you are building a system where the event log is the source of truth and you need to replay from the beginning of time, you need Kafka, Pulsar, or a database. Redis Streams is a buffer, not a ledger.
Multi-datacenter replication. Redis Cluster does not support multi-datacenter deployment natively. Redis Enterprise offers active-active geo-replication, but it comes at a significant cost premium. If your event infrastructure must span regions, look elsewhere.
When you don't already have Redis. If Redis is not already in your infrastructure, deploying it solely for Streams is hard to justify when purpose-built brokers exist. The "you already have Redis" argument works in reverse — if you don't have it, the calculus changes.
Operational Reality
Operating Redis Streams means operating Redis, which means you are either running Redis yourself (on VMs or Kubernetes), using a managed service (AWS ElastiCache, GCP Memorystore, Azure Cache for Redis, Redis Cloud), or running Redis Enterprise.
Memory monitoring is the single most critical operational practice. A stream that grows faster than your trimming policy will consume all available memory. Monitor used_memory, stream_length (via XLEN), and set alerts aggressively. Redis's maxmemory-policy does not apply to Streams in a useful way — if Redis evicts your stream to free memory, you have lost your event log. Use noeviction policy and trim your streams explicitly.
Consumer health monitoring requires polling XPENDING regularly. Build a monitoring job that checks each consumer group's pending count and the idle time of the oldest pending entry. If a consumer has entries pending for longer than your expected processing time, it is dead or stuck, and you need to trigger XAUTOCLAIM from a healthy consumer.
Persistence configuration depends on your durability requirements. For streams where data loss is acceptable (real-time feeds, ephemeral notifications): RDB snapshots every 60 seconds are fine. For streams where data loss is merely tolerable: AOF with appendfsync everysec. For streams where you are pretending Redis is a durable message broker: AOF with appendfsync always, and accept the latency penalty — then seriously reconsider whether Redis is the right tool.
Upgrading is straightforward. Redis is a single binary with excellent backward compatibility. Streams are part of the core data model, not a plugin. Rolling upgrades in a Cluster or Sentinel deployment work as expected.
Code Examples
Python (redis-py)
import redis
import json
import time
import signal
import sys
r = redis.Redis(host='localhost', port=6379, decode_responses=True)
# --- Producer ---
def publish_order_event(order_id: str, total: float, region: str):
"""
Append an event to the orders stream.
MAXLEN ~ 100000 ensures the stream doesn't grow without bound.
"""
entry_id = r.xadd(
'orders',
{
'orderId': order_id,
'totalAmount': str(total), # Redis stores strings
'region': region,
'eventType': 'OrderPlaced',
'timestamp': str(int(time.time() * 1000)),
},
maxlen=100000,
approximate=True,
)
print(f"Published order {order_id} as {entry_id}")
return entry_id
# --- Consumer group setup ---
def setup_consumer_group(stream: str, group: str):
"""Create a consumer group, starting from new entries."""
try:
r.xgroup_create(stream, group, id='$', mkstream=True)
print(f"Created consumer group '{group}' on stream '{stream}'")
except redis.ResponseError as e:
if 'BUSYGROUP' in str(e):
print(f"Consumer group '{group}' already exists")
else:
raise
# --- Consumer ---
def consume_orders(group: str, consumer_name: str):
"""
Consume from a stream using consumer groups.
Handles new messages, pending recovery, and graceful shutdown.
"""
stream = 'orders'
setup_consumer_group(stream, group)
running = True
def shutdown(sig, frame):
nonlocal running
running = False
signal.signal(signal.SIGTERM, shutdown)
signal.signal(signal.SIGINT, shutdown)
# First, recover any pending entries from a previous crash
recover_pending(stream, group, consumer_name)
while running:
try:
# Read new entries (> means undelivered entries only)
entries = r.xreadgroup(
groupname=group,
consumername=consumer_name,
streams={stream: '>'},
count=10,
block=5000, # Block for 5 seconds
)
if not entries:
continue
for stream_name, messages in entries:
for msg_id, fields in messages:
try:
process_order(fields)
r.xack(stream, group, msg_id)
except Exception as e:
print(f"Failed to process {msg_id}: {e}")
# Don't ack — entry stays in PEL for recovery
except redis.ConnectionError:
print("Redis connection lost, reconnecting...")
time.sleep(1)
print("Consumer shut down gracefully")
def recover_pending(stream: str, group: str, consumer_name: str):
"""
Claim and reprocess entries that were pending for too long.
This handles the case where a previous consumer instance crashed.
"""
while True:
# Claim entries idle for more than 30 seconds
claimed = r.xautoclaim(
name=stream,
groupname=group,
consumername=consumer_name,
min_idle_time=30000, # 30 seconds
start_id='0-0',
count=10,
)
# xautoclaim returns (next_start_id, claimed_entries, deleted_ids)
next_id, entries, deleted = claimed
for msg_id, fields in entries:
try:
process_order(fields)
r.xack(stream, group, msg_id)
print(f"Recovered and processed pending entry {msg_id}")
except Exception as e:
print(f"Failed to recover {msg_id}: {e}")
# Check delivery count for dead-letter logic
pending_info = r.xpending_range(
stream, group, min=msg_id, max=msg_id, count=1
)
if pending_info and pending_info[0]['times_delivered'] > 5:
dead_letter(stream, group, msg_id, fields)
if next_id == b'0-0' or next_id == '0-0' or not entries:
break
def dead_letter(stream: str, group: str, msg_id: str, fields: dict):
"""Move a poison message to a dead letter stream."""
r.xadd(f'{stream}:dead-letter', {
**fields,
'original_stream': stream,
'original_id': msg_id,
'dead_lettered_at': str(int(time.time() * 1000)),
})
r.xack(stream, group, msg_id)
print(f"Dead-lettered {msg_id}")
def process_order(fields: dict):
"""Your business logic here."""
print(f"Processing order {fields.get('orderId')}: "
f"${fields.get('totalAmount')} in {fields.get('region')}")
# --- Monitoring helper ---
def stream_health(stream: str, group: str):
"""Print stream and consumer group health metrics."""
length = r.xlen(stream)
info = r.xinfo_groups(stream)
print(f"\nStream '{stream}': {length} entries")
for g in info:
print(f" Group '{g['name']}': "
f"{g['pending']} pending, "
f"{g['consumers']} consumers, "
f"last-delivered: {g['last-delivered-id']}")
consumers = r.xinfo_consumers(stream, g['name'])
for c in consumers:
print(f" Consumer '{c['name']}': "
f"{c['pending']} pending, "
f"idle {c['idle']}ms")
Node.js (ioredis)
const Redis = require('ioredis');
const redis = new Redis();
// --- Producer ---
async function publishOrderEvent(orderId, total, region) {
const id = await redis.xadd(
'orders',
'MAXLEN', '~', '100000',
'*',
'orderId', orderId,
'totalAmount', String(total),
'region', region,
'eventType', 'OrderPlaced',
'timestamp', String(Date.now()),
);
console.log(`Published order ${orderId} as ${id}`);
return id;
}
// --- Consumer with consumer group ---
async function consume(group, consumerName) {
// Create group if it doesn't exist
try {
await redis.xgroup('CREATE', 'orders', group, '$', 'MKSTREAM');
} catch (err) {
if (!err.message.includes('BUSYGROUP')) throw err;
}
console.log(`Consumer ${consumerName} starting in group ${group}`);
while (true) {
try {
const results = await redis.xreadgroup(
'GROUP', group, consumerName,
'COUNT', '10',
'BLOCK', '5000',
'STREAMS', 'orders', '>'
);
if (!results) continue;
for (const [stream, messages] of results) {
for (const [id, fields] of messages) {
// fields is a flat array: ['orderId', 'ord-7829', 'totalAmount', '149.99', ...]
const event = {};
for (let i = 0; i < fields.length; i += 2) {
event[fields[i]] = fields[i + 1];
}
try {
await processOrder(event);
await redis.xack('orders', group, id);
} catch (err) {
console.error(`Failed to process ${id}: ${err.message}`);
// Leave unacked for recovery
}
}
}
} catch (err) {
if (err.message.includes('NOGROUP')) {
console.error('Consumer group does not exist');
break;
}
console.error(`Consumer error: ${err.message}`);
await new Promise(resolve => setTimeout(resolve, 1000));
}
}
}
async function processOrder(event) {
console.log(`Processing order ${event.orderId}: $${event.totalAmount}`);
}
// --- Run ---
// publishOrderEvent('ord-7829', 149.99, 'us-east-1');
// consume('order-processors', 'worker-1');
Verdict
Redis Streams is the messaging solution for pragmatists who already have Redis and do not need a purpose-built message broker. It is not a Kafka replacement, and calling it one does both technologies a disservice. It is a lightweight, high-performance event log that lives where your data already lives, and for the right use cases, it is the simplest path to event-driven communication.
The right use cases are: short-retention event buses, real-time activity feeds, task queues with observability, and inter-service communication at moderate scale. The wrong use cases are: long-term event storage, event sourcing as a system-of-record pattern, high-durability regulated workloads, and anything that needs more throughput than a single Redis node can provide.
The API is small and well-designed. Consumer groups work. The PEL provides genuine operational visibility. Sub-millisecond latency is real. And the total cost of adoption — when Redis is already in your stack — is a few hours of reading documentation and writing a consumer loop.
The honest framing is this: Redis Streams turns your cache into a capable lightweight event bus. Whether that is a good idea depends entirely on whether "lightweight event bus" is what you need. If it is, there is nothing simpler. If it is not, no amount of Redis affection will change the physics of in-memory storage or the mathematics of single-node throughput. Know what you need, and choose accordingly.