Chapter 5: Modern Lossless — Deflate, Brotli, Zstandard
The LZ family gave us the dictionary method. Huffman coding gave us the entropy stage. The insight of DEFLATE was: combine them. Use LZ77 to remove repetition, then Huffman to compress the residue. The result became the compression workhorse of the internet — used inside gzip, zlib, PNG, PDF, and zip files.
Brotli and Zstandard improved on DEFLATE with better probability modeling, larger dictionaries, and faster algorithms. Zstd in particular has become the default choice for new systems, with a design that's both faster and more compressible than gzip.
DEFLATE: LZ77 + Huffman
DEFLATE was defined by Phil Katz in 1991 (for PKZIP) and later formalized in RFC 1951. It's not one algorithm — it's a carefully designed combination:
DEFLATE pipeline:
Raw data
│
▼
LZ77 matching
(find back-references in 32KB sliding window)
│
▼
Token stream: literals and (offset, length) pairs
│
▼
Huffman encoding
(two trees: one for literals/lengths, one for offsets)
│
▼
Compressed bitstream
The Two Huffman Trees
DEFLATE uses two Huffman trees simultaneously:
Literal/Length tree: Encodes 286 possible values.
- 0–255: literal byte values
- 256: end-of-block marker
- 257–285: match lengths (lengths 3–258, with some range-coded)
Distance tree: Encodes 30 possible distance codes (representing the back-reference offset). Short distances get short codes; long distances (near 32KB) get longer codes.
DEFLATE symbol types:
Literal 'A' (0x41):
┌──────────┐
│ 0x41 │ → Literal/Length Huffman code for 0x41
└──────────┘
Back-reference (offset=7, length=5):
┌──────────┬─────────────┐
│ Len code │ Dist code │
│ (261) │ (4) │
└──────────┴─────────────┘
Length 261 means "length 5" (with no extra bits)
Distance 4 means "offset 7-8" (with 1 extra bit)
Block Types
DEFLATE has three block types, and the compressor chooses per-block which to use:
- Type 0 (stored): No compression. Used when the input is already compressed or random.
- Type 1 (fixed Huffman): Uses predefined, hardcoded Huffman codes. Fast to encode, slightly suboptimal.
- Type 2 (dynamic Huffman): Stores custom Huffman trees at the start of each block, then uses them. Optimal but has header overhead.
gzip file structure:
┌─────────────────────────────────────────────────────┐
│ gzip header (10 bytes) │
│ - Magic: \x1f\x8b │
│ - Compression method: 8 (DEFLATE) │
│ - Flags, modification time, OS │
├─────────────────────────────────────────────────────┤
│ DEFLATE compressed data (multiple blocks) │
│ Block 1: [type][Huffman tables if type 2][data] │
│ Block 2: [type][Huffman tables if type 2][data] │
│ ... │
│ Final block: BFINAL bit set │
├─────────────────────────────────────────────────────┤
│ CRC-32 of original data (4 bytes) │
│ Original size mod 2^32 (4 bytes) │
└─────────────────────────────────────────────────────┘
DEFLATE's Limitations
DEFLATE was designed in 1991 and it shows:
- 32KB window: Modern files are often megabytes or gigabytes. Patterns farther than 32KB apart can't be referenced.
- Huffman entropy coding: As we saw in Chapter 3, arithmetic coding and ANS can do better for skewed probabilities.
- Static two-tree model: The Huffman trees are trained on the block, not on the specific statistics of literals vs. back-references.
- Sequential only: No parallelism; compressor must process the stream serially.
Despite these limitations, DEFLATE is everywhere. The decompressor is simple enough to implement in firmware, the format is standardized and stable, and it's good enough for most uses.
Brotli: Google's Web Compressor
Brotli was released by Google in 2013 and standardized in RFC 7932 (2016). It's designed primarily for compressing web content (HTML, CSS, JS), and it has a trick that DEFLATE doesn't: a built-in static dictionary.
The Static Dictionary
Brotli ships with a 120KB dictionary of common web strings. Things like "<!DOCTYPE html>", "Content-Type: ", "text/javascript", "function ", and thousands of common JavaScript, HTML, and CSS tokens.
References to this static dictionary can be made without the dictionary appearing in the compressed stream at all — the decompressor already has it. This gives Brotli a massive advantage on small web resources where a traditional compressor would waste most of its window learning the format.
Brotli static dictionary example:
Compressing: '<html lang="en">'
Without static dict:
Encode each character literally, or hope it appears earlier in the window.
With Brotli static dict:
'<html ' → static dict entry #1234 (immediate reference, no prior occurrence needed)
'lang="en"' → static dict entry #5678
'>' → literal
Much better compression for the first occurrence of common patterns.
Brotli's Other Improvements
Beyond the static dictionary, Brotli improves on DEFLATE with:
- Larger sliding window: Up to 16MB (vs 32KB for DEFLATE)
- Context modeling: Uses 64 context maps, conditioning symbol probabilities on the context (previous bytes, literal vs. length/distance)
- Distance ring buffer: Tracks the four most recent distances; referencing a recent distance costs fewer bits
- Better match finding: More sophisticated algorithms for finding long matches
Compression ratios on web content (lower bits/byte = better):
Format HTML JS CSS Average
────────────────────────────────────────────────────
Uncompressed (bytes) 100% 100% 100% 100%
gzip level 6 27% 32% 23% 27%
Brotli level 6 22% 26% 19% 22%
Brotli level 11 19% 23% 17% 20%
Brotli compresses better, but it's slower than gzip at comparable quality settings. This is acceptable for static assets that are compressed once and served many times. For dynamic content, gzip or Zstd are better choices.
Brotli's decompression speed is comparable to gzip — fast enough for real-time web serving.
Zstandard: The New General Purpose
Zstandard (zstd) was developed at Facebook and open-sourced in 2016. It's designed to be:
- Fast at both compression and decompression
- Competitive with gzip at comparable quality
- Better than gzip at higher quality settings
- A drop-in replacement for zlib in many contexts
Zstandard has become the default compression algorithm for Linux kernels, Python wheel packages, Facebook's storage infrastructure, and Debian packages.
Zstd's Core Architecture
Zstandard encoding pipeline:
Raw data
│
├──► LZ77 matching (larger window than DEFLATE, configurable)
│
├──► Sequence encoding (literals + match tokens)
│
├──► Literal compression (tANS/FSE entropy coding)
│
├──► Sequence header compression (tANS for match commands)
│
▼
zstd frame (with optional frame header, checksums)
The key innovations in Zstd:
tANS (Finite State Entropy): Zstd uses tANS instead of Huffman for entropy coding. As discussed in Chapter 3, this approaches arithmetic coding quality at Huffman speed.
Training dictionaries: Zstd supports trained dictionaries — if you're compressing many similar small files (like JSON API responses), you can train a dictionary on a sample set. References to the dictionary dramatically improve compression of small inputs.
Configurable window size: DEFLATE is stuck at 32KB. Zstd can use windows up to 2GB, enabling better compression of large files with distant repetitions.
Parallel compression: zstd can split the input into blocks and compress them in parallel using multiple cores (with the --threads flag). Output is still a single stream, fully compatible with single-threaded decompression.
Zstd Compression Levels
Zstd's level system is broader than gzip's 1-9:
Zstd compression levels:
Level Speed (encode) Ratio Notes
──────────────────────────────────────────────────────
1 Very fast Moderate Competitive with LZ4
3 Fast Good Default; beats gzip speed
6 Moderate Good Near gzip -6 quality, faster
9 Moderate Better Better than gzip -9
12 Slow Best fast Starts using more memory
19 Very slow Excellent Near Brotli quality
22 Extremely slow Maximum (--ultra flag required)
Negative levels (e.g. -1, -5): faster than level 1, less compression.
Useful for real-time or log compression where CPU is the bottleneck.
# Python example using zstandard library
import zstandard as zstd
# Basic compression
cctx = zstd.ZstdCompressor(level=3)
compressed = cctx.compress(data)
dctx = zstd.ZstdDecompressor()
original = dctx.decompress(compressed)
# Streaming (memory-efficient for large data)
cctx = zstd.ZstdCompressor(level=6)
with open('large_file.txt', 'rb') as fin:
with open('large_file.zst', 'wb') as fout:
cctx.copy_stream(fin, fout)
# With a trained dictionary (for many small files)
# Train on samples:
samples = [b'{"user":"alice","action":"login"}',
b'{"user":"bob","action":"view"}',
...] # many samples
params = zstd.ZstdCompressionParameters.from_level(3)
dict_data = zstd.train_dictionary(8192, samples) # 8KB dict
# Compress with dictionary:
cctx = zstd.ZstdCompressor(dict_data=dict_data, level=3)
compressed = cctx.compress(b'{"user":"carol","action":"login"}')
# Much smaller than without dictionary!
Why Zstd Won
Head-to-head: Zstd vs. gzip on a 1GB log file
Metric gzip -6 zstd -3 zstd -6
──────────────────────────────────────────────────────────
Compression time 28s 1.8s 4.2s
Compressed size 312MB 320MB 298MB
Decompression time 3.2s 0.9s 0.9s
Compression ratio 3.2× 3.1× 3.4×
Memory usage (compress) 256KB 8MB 16MB
Memory usage (decompress) 32KB 128KB 128KB
Zstd is ~15× faster to compress at comparable ratio, and ~3.5× faster to decompress. The cost is slightly higher memory usage — acceptable for almost every modern context.
For static web assets, Brotli at level 11 is better (higher compression ratio). For real-time or batch compression of dynamic data, Zstd is the choice.
Putting It Together: When to Use What
Decision tree for lossless compression:
Is compatibility with old tools required?
├─ Yes → gzip/DEFLATE (ubiquitous)
└─ No → continue
Is compression speed the primary constraint?
├─ Yes (real-time, < 100ms) → LZ4 or Zstd level 1
└─ No → continue
Is decompression speed critical?
├─ Very (database, in-memory) → LZ4 or Snappy
└─ No → continue
Is the data primarily web content?
├─ Yes, static assets → Brotli level 6-11
├─ Yes, dynamic content → Zstd level 3 or gzip
└─ No → continue
Is maximum compression ratio needed?
├─ Yes, time doesn't matter → LZMA/7-zip (Chapter 6)
└─ No → Zstd level 6-9 (sweet spot)
Under the Hood: The zlib API
Most code doesn't call DEFLATE directly — it calls zlib, the C library that wraps DEFLATE and exposes a streaming API. Understanding the zlib API helps when debugging compression issues:
#include <zlib.h>
// Compress in one shot
int compress_buffer(const uint8_t *src, size_t src_len,
uint8_t *dst, size_t *dst_len) {
z_stream strm = {0};
deflateInit(&strm, Z_DEFAULT_COMPRESSION); // level 6
strm.next_in = (Bytef*)src;
strm.avail_in = src_len;
strm.next_out = dst;
strm.avail_out = *dst_len;
int ret = deflate(&strm, Z_FINISH);
*dst_len = strm.total_out;
deflateEnd(&strm);
return ret == Z_STREAM_END ? Z_OK : ret;
}
The streaming API (deflate/inflate called repeatedly) handles data larger than memory by processing in chunks. This is how nginx compresses HTTP responses on-the-fly without buffering the entire response body.
Summary
- DEFLATE = LZ77 + Huffman, 32KB window. Universal, fast to decompress, shows its age.
- Brotli adds a static web dictionary and better entropy coding. Ideal for static web assets.
- Zstandard uses tANS entropy coding, configurable windows, and trained dictionaries. The modern default for general-purpose compression — faster than gzip, better compression at high levels.
- For most new systems: use Zstd. For web serving: serve Brotli for supporting clients, gzip for older ones.
Next up: LZMA and 7-Zip — what happens when you push lossless compression to its practical limit.