Chapter 14: Column Stores vs Row Stores
“How you store data determines what you can do with it efficiently.”
Traditional databases store data row by row. But for analytical queries that scan millions of rows but only need a few columns, this is wasteful. Column stores flip the model, storing data column by column. This chapter explores both approaches and when to use each.
14.1 The Row Store Model (NSM)
Row stores use N-ary Storage Model (NSM)—all columns of a row stored together:
ROW STORE LAYOUT
Table: users (id, name, email, age, city)
Disk/Page Layout:
┌─────────────────────────────────────────────────────────────┐
│ Row 1: [id=1, name='Alice', email='a@b.com', age=30, city='NYC'] │
│ Row 2: [id=2, name='Bob', email='b@b.com', age=25, city='LA'] │
│ Row 3: [id=3, name='Carol', email='c@b.com', age=35, city='NYC'] │
│ ... │
└─────────────────────────────────────────────────────────────┘
Query: SELECT * FROM users WHERE id = 1
→ Read one contiguous chunk → All columns available → Fast!
Row Store Strengths
OLTP Workloads:
┌─────────────────────────────────────────────────────────────┐
│ SELECT * FROM users WHERE id = 42 │
│ → Fetch one row, all columns, one I/O │
│ │
│ INSERT INTO users VALUES (...) │
│ → Write one contiguous chunk │
│ │
│ UPDATE users SET email = '...' WHERE id = 42 │
│ → Read row, modify, write back │
└─────────────────────────────────────────────────────────────┘
Excellent for: Point queries, full-row access, transactions
Row Store Weaknesses
Analytical Query:
SELECT AVG(age) FROM users;
┌─────────────────────────────────────────────────────────────┐
│ Row 1: [id=1, name='Alice', email='a@b.com', AGE=30, city='NYC'] │
│ Row 2: [id=2, name='Bob', email='b@b.com', AGE=25, city='LA'] │
│ Row 3: [id=3, name='Carol', email='c@b.com', AGE=35, city='NYC'] │
└─────────────────────────────────────────────────────────────┘
│
Only need AGE column!
But must read entire rows
If each row is 200 bytes and age is 4 bytes:
Reading 1M rows = 200MB of I/O
Actually need only 4MB (2% of data!)
14.2 The Column Store Model (DSM)
Column stores use Decomposition Storage Model (DSM)—each column stored separately:
COLUMN STORE LAYOUT
Table: users (id, name, email, age, city)
id column: [1, 2, 3, 4, 5, 6, 7, ...]
name column: ['Alice', 'Bob', 'Carol', 'Dave', ...]
email column: ['a@b.com', 'b@b.com', 'c@b.com', ...]
age column: [30, 25, 35, 28, 42, 31, 27, ...]
city column: ['NYC', 'LA', 'NYC', 'Chicago', ...]
Query: SELECT AVG(age) FROM users
→ Read only age column
→ 4MB instead of 200MB
→ 50x less I/O!
Column Store Strengths
Analytical Queries:
┌─────────────────────────────────────────────────────────────┐
│ SELECT AVG(age), city FROM users GROUP BY city │
│ → Read only age and city columns │
│ → Ignore id, name, email │
│ │
│ SELECT COUNT(*) FROM users WHERE age > 30 │
│ → Read only age column │
│ → Extremely fast filtering │
└─────────────────────────────────────────────────────────────┘
Benefits:
- Read only columns you need
- Better compression (similar values together)
- Vectorized processing (operate on column chunks)
- Better CPU cache utilization
Column Store Weaknesses
Point Query:
SELECT * FROM users WHERE id = 42
┌─────────────────────────────────────────────────────────────┐
│ Must read from 5 different locations (one per column) │
│ Then reconstruct the row │
│ Much slower than row store for this pattern! │
└─────────────────────────────────────────────────────────────┘
INSERT INTO users VALUES (...)
┌─────────────────────────────────────────────────────────────┐
│ Must write to 5 different locations │
│ Columnar format may require append buffers │
│ Slower for single-row operations │
└─────────────────────────────────────────────────────────────┘
14.3 Compression in Column Stores
Same-type values stored together compress extremely well:
Run-Length Encoding (RLE)
city column: ['NYC', 'NYC', 'NYC', 'LA', 'LA', 'NYC', ...]
RLE: [(NYC, 3), (LA, 2), (NYC, 1), ...]
If sorted by city:
['LA', 'LA', ..., 'NYC', 'NYC', ...]
RLE: [(LA, 50000), (NYC, 150000), ...]
→ Massive compression!
Dictionary Encoding
city column: ['NYC', 'LA', 'NYC', 'Chicago', 'NYC', ...]
Dictionary: {0: 'NYC', 1: 'LA', 2: 'Chicago'}
Encoded: [0, 1, 0, 2, 0, ...]
Benefits:
- Fixed-width codes (faster processing)
- Much smaller storage
- Comparisons on codes, not strings
Bit-Packing
age column: [30, 25, 35, 28, 42, 31, 27, ...]
All values < 128, so need only 7 bits each!
Instead of 32 bits per integer → 4.5x compression
Compression Ratios
Row Store: 1:1 to 1:3 typical
Column Store: 1:5 to 1:20 typical
Example:
Raw data: 100GB
Row store (compressed): 50GB
Column store: 5-10GB
14.4 Vectorized Execution
Column stores enable vectorized processing—operating on batches of values:
SCALAR VS VECTORIZED
Scalar (row-at-a-time):
for each row:
if row.age > 30:
output row
→ Function call overhead per row
→ Poor CPU cache utilization
→ Branch mispredictions
Vectorized (column-at-a-time):
age_vector = load 1000 ages
mask = age_vector > 30 # SIMD comparison
output = apply mask # Batch filtering
→ Process 1000 values with one CPU instruction
→ Excellent cache utilization
→ No branches, predictable execution
SIMD Operations
Modern CPUs can process multiple values simultaneously:
AVX-512: Process 16 32-bit integers at once
Traditional:
for i in 0..16:
result[i] = a[i] + b[i]
→ 16 add instructions
SIMD:
result = _mm512_add_epi32(a, b)
→ 1 instruction for 16 adds!
14.5 Late Materialization
Column stores delay row reconstruction:
Query: SELECT name, email FROM users WHERE age > 30 AND city = 'NYC'
Early Materialization (reconstruct rows first):
1. Scan age column, get matching row IDs: {1, 3, 5, 7, ...}
2. For each ID, reconstruct full row
3. Filter by city
4. Extract name, email
Late Materialization (defer reconstruction):
1. Scan age column, get positions: {1, 3, 5, 7, ...}
2. Scan city column at those positions only
3. Filter: positions {1, 3, ...} remain
4. Only now fetch name, email for final positions
Benefits:
- Skip columns that get filtered out
- Less data movement
- Better cache utilization
14.6 Column Store Architectures
Pure Column Stores
Store nothing but columns:
Examples: ClickHouse, DuckDB, Vertica
Each column in separate file
Row reconstruction via position matching
Hybrid Row/Column
Store data in columnar format within row-oriented pages:
PAX (Partition Attributes Across):
┌─────────────────────────────────────────────────────────────┐
│ Page contains N rows, stored column-by-column within page │
│ │
│ [all ids] [all names] [all emails] [all ages] [all cities] │
│ │ │ │ │ │ │
│ Row 1-N Row 1-N Row 1-N Row 1-N Row 1-N │
└─────────────────────────────────────────────────────────────┘
Benefits:
- Column-oriented within page (good compression, vectorization)
- Row locality across pages (good for reconstruction)
PostgreSQL Column Extensions
-- cstore_fdw (now Citus columnar)
CREATE TABLE events (
event_id bigint,
event_type text,
data jsonb,
created_at timestamp
) USING columnar;
-- Benefits: Compression, column pruning
-- Trade-off: Slower updates, best for append-only
14.7 OLTP vs OLAP
The fundamental difference in workload patterns:
OLTP vs OLAP
OLTP (Online Transaction Processing):
┌─────────────────────────────────────────────────────────────┐
│ • Many small transactions │
│ • Point queries: SELECT * WHERE id = ? │
│ • Single-row INSERT, UPDATE, DELETE │
│ • Low latency critical │
│ • Concurrent users │
│ │
│ Best: Row stores (PostgreSQL, MySQL, Oracle) │
└─────────────────────────────────────────────────────────────┘
OLAP (Online Analytical Processing):
┌─────────────────────────────────────────────────────────────┐
│ • Few large queries │
│ • Full scans: SELECT AVG(x) FROM big_table │
│ • Read-mostly, batch loads │
│ • Throughput critical │
│ • Few concurrent queries │
│ │
│ Best: Column stores (ClickHouse, DuckDB, Snowflake) │
└─────────────────────────────────────────────────────────────┘
14.8 Column Store Examples
ClickHouse
Open-source column store for real-time analytics:
CREATE TABLE events (
event_date Date,
event_type String,
user_id UInt64,
data String
) ENGINE = MergeTree()
ORDER BY (event_date, event_type);
-- Query billions of rows in seconds
SELECT
event_type,
count() as cnt
FROM events
WHERE event_date >= '2024-01-01'
GROUP BY event_type
ORDER BY cnt DESC;
DuckDB
Embedded analytical database:
import duckdb
# Analyze Parquet files directly
result = duckdb.sql("""
SELECT
product_category,
SUM(amount) as total
FROM 'sales/*.parquet'
GROUP BY product_category
""").fetchall()
# Or query pandas DataFrames
import pandas as pd
df = pd.read_csv('large_file.csv')
duckdb.sql("SELECT AVG(value) FROM df WHERE category = 'A'")
Parquet File Format
Columnar storage format for data lakes:
Parquet File Structure:
┌─────────────────────────────────────────────────────────────┐
│ Row Group 1 │
│ Column Chunk: id (compressed) │
│ Column Chunk: name (compressed) │
│ Column Chunk: email (compressed) │
│ │
│ Row Group 2 │
│ Column Chunk: id (compressed) │
│ Column Chunk: name (compressed) │
│ Column Chunk: email (compressed) │
│ │
│ Footer: Schema, column statistics, offsets │
└─────────────────────────────────────────────────────────────┘
Benefits:
- Column pruning (skip columns)
- Row group pruning (skip row groups via statistics)
- Excellent compression
- Language-independent format
14.9 Hybrid Architectures
HTAP (Hybrid Transactional/Analytical Processing)
Single system for both workloads:
HTAP ARCHITECTURE
┌─────────────────────────────────────────────────────────────┐
│ HTAP Database │
│ │
│ ┌──────────────────────┐ ┌──────────────────────┐ │
│ │ Row Store │ │ Column Store │ │
│ │ (OLTP Engine) │ │ (OLAP Engine) │ │
│ │ │ │ │ │
│ │ Transactions │ │ Analytics │ │
│ │ Point queries │ │ Aggregations │ │
│ │ Updates │ │ Scans │ │
│ └──────────┬───────────┘ └───────────┬──────────┘ │
│ │ │ │
│ └──────────┬───────────────┘ │
│ │ │
│ Sync/Replication │
│ │
│ Examples: TiDB, SingleStore, Oracle Dual Format │
└─────────────────────────────────────────────────────────────┘
PostgreSQL + Analytics
-- Use foreign data wrapper to columnar store
CREATE EXTENSION clickhouse_fdw;
CREATE SERVER clickhouse_server
FOREIGN DATA WRAPPER clickhouse_fdw
OPTIONS (host 'analytics-server');
CREATE FOREIGN TABLE events_analytical (...)
SERVER clickhouse_server;
-- OLTP in PostgreSQL, OLAP in ClickHouse
-- Sync via logical replication
14.10 When to Choose What
Choose Row Store When:
✓ OLTP workload
✓ Point queries by primary key
✓ Full-row access common
✓ Many small transactions
✓ Need strong ACID guarantees
✓ Complex transactions with updates
Examples: User data, orders, inventory
Choose Column Store When:
✓ OLAP workload
✓ Aggregate queries (SUM, AVG, COUNT)
✓ Accessing few columns from wide tables
✓ Large sequential scans
✓ Append-mostly data
✓ Historical/time-series data
Examples: Event logs, metrics, analytics
Consider Hybrid When:
✓ Need both OLTP and OLAP
✓ Real-time analytics on transactional data
✓ Can tolerate some complexity
✓ Single source of truth required
14.11 Performance Comparison
Benchmark: 1 billion rows, 100 columns
Query: SELECT AVG(col1) FROM table
Row Store: ~300 seconds
Column Store: ~3 seconds
Improvement: 100x
Query: SELECT * FROM table WHERE id = 12345
Row Store: ~1 millisecond (with index)
Column Store: ~100 milliseconds
Degradation: 100x
The right tool for the right job!
14.12 Summary
Column stores and row stores serve different needs:
| Aspect | Row Store | Column Store |
|---|---|---|
| Storage | Row by row | Column by column |
| Best for | OLTP | OLAP |
| Point queries | Fast | Slow |
| Full scans | Slow | Fast |
| Compression | Moderate | Excellent |
| Write speed | Fast | Slower |
| Update speed | Fast | Slow |
Key takeaways:
- Row stores excel at transactional workloads
- Column stores excel at analytical workloads
- Compression is much better in column stores
- Vectorized execution accelerates column processing
- Hybrid architectures try to provide both
What’s Next
In Chapter 15, we’ll explore distributed databases and replication—how databases scale beyond a single machine.
“A column store and a row store looking at the same data see entirely different things. Choose the perspective that matches your questions.”