Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

Chapter 14: Column Stores vs Row Stores

“How you store data determines what you can do with it efficiently.”

Traditional databases store data row by row. But for analytical queries that scan millions of rows but only need a few columns, this is wasteful. Column stores flip the model, storing data column by column. This chapter explores both approaches and when to use each.


14.1 The Row Store Model (NSM)

Row stores use N-ary Storage Model (NSM)—all columns of a row stored together:

                    ROW STORE LAYOUT

    Table: users (id, name, email, age, city)

    Disk/Page Layout:
    ┌─────────────────────────────────────────────────────────────┐
    │ Row 1: [id=1, name='Alice', email='a@b.com', age=30, city='NYC'] │
    │ Row 2: [id=2, name='Bob', email='b@b.com', age=25, city='LA']    │
    │ Row 3: [id=3, name='Carol', email='c@b.com', age=35, city='NYC'] │
    │ ...                                                              │
    └─────────────────────────────────────────────────────────────┘

    Query: SELECT * FROM users WHERE id = 1
    → Read one contiguous chunk → All columns available → Fast!

Row Store Strengths

    OLTP Workloads:
    ┌─────────────────────────────────────────────────────────────┐
    │ SELECT * FROM users WHERE id = 42                            │
    │ → Fetch one row, all columns, one I/O                       │
    │                                                              │
    │ INSERT INTO users VALUES (...)                               │
    │ → Write one contiguous chunk                                 │
    │                                                              │
    │ UPDATE users SET email = '...' WHERE id = 42                │
    │ → Read row, modify, write back                               │
    └─────────────────────────────────────────────────────────────┘

    Excellent for: Point queries, full-row access, transactions

Row Store Weaknesses

    Analytical Query:
    SELECT AVG(age) FROM users;

    ┌─────────────────────────────────────────────────────────────┐
    │ Row 1: [id=1, name='Alice', email='a@b.com', AGE=30, city='NYC'] │
    │ Row 2: [id=2, name='Bob', email='b@b.com', AGE=25, city='LA']    │
    │ Row 3: [id=3, name='Carol', email='c@b.com', AGE=35, city='NYC'] │
    └─────────────────────────────────────────────────────────────┘
                                          │
                              Only need AGE column!
                              But must read entire rows

    If each row is 200 bytes and age is 4 bytes:
    Reading 1M rows = 200MB of I/O
    Actually need only 4MB (2% of data!)

14.2 The Column Store Model (DSM)

Column stores use Decomposition Storage Model (DSM)—each column stored separately:

                    COLUMN STORE LAYOUT

    Table: users (id, name, email, age, city)

    id column:    [1, 2, 3, 4, 5, 6, 7, ...]
    name column:  ['Alice', 'Bob', 'Carol', 'Dave', ...]
    email column: ['a@b.com', 'b@b.com', 'c@b.com', ...]
    age column:   [30, 25, 35, 28, 42, 31, 27, ...]
    city column:  ['NYC', 'LA', 'NYC', 'Chicago', ...]

    Query: SELECT AVG(age) FROM users
    → Read only age column
    → 4MB instead of 200MB
    → 50x less I/O!

Column Store Strengths

    Analytical Queries:
    ┌─────────────────────────────────────────────────────────────┐
    │ SELECT AVG(age), city FROM users GROUP BY city               │
    │ → Read only age and city columns                             │
    │ → Ignore id, name, email                                     │
    │                                                              │
    │ SELECT COUNT(*) FROM users WHERE age > 30                   │
    │ → Read only age column                                       │
    │ → Extremely fast filtering                                   │
    └─────────────────────────────────────────────────────────────┘

    Benefits:
    - Read only columns you need
    - Better compression (similar values together)
    - Vectorized processing (operate on column chunks)
    - Better CPU cache utilization

Column Store Weaknesses

    Point Query:
    SELECT * FROM users WHERE id = 42

    ┌─────────────────────────────────────────────────────────────┐
    │ Must read from 5 different locations (one per column)       │
    │ Then reconstruct the row                                     │
    │ Much slower than row store for this pattern!                │
    └─────────────────────────────────────────────────────────────┘

    INSERT INTO users VALUES (...)
    ┌─────────────────────────────────────────────────────────────┐
    │ Must write to 5 different locations                         │
    │ Columnar format may require append buffers                  │
    │ Slower for single-row operations                            │
    └─────────────────────────────────────────────────────────────┘

14.3 Compression in Column Stores

Same-type values stored together compress extremely well:

Run-Length Encoding (RLE)

    city column: ['NYC', 'NYC', 'NYC', 'LA', 'LA', 'NYC', ...]

    RLE: [(NYC, 3), (LA, 2), (NYC, 1), ...]

    If sorted by city:
    ['LA', 'LA', ..., 'NYC', 'NYC', ...]
    RLE: [(LA, 50000), (NYC, 150000), ...]
    → Massive compression!

Dictionary Encoding

    city column: ['NYC', 'LA', 'NYC', 'Chicago', 'NYC', ...]

    Dictionary: {0: 'NYC', 1: 'LA', 2: 'Chicago'}
    Encoded:    [0, 1, 0, 2, 0, ...]

    Benefits:
    - Fixed-width codes (faster processing)
    - Much smaller storage
    - Comparisons on codes, not strings

Bit-Packing

    age column: [30, 25, 35, 28, 42, 31, 27, ...]

    All values < 128, so need only 7 bits each!
    Instead of 32 bits per integer → 4.5x compression

Compression Ratios

    Row Store: 1:1 to 1:3 typical
    Column Store: 1:5 to 1:20 typical

    Example:
    Raw data: 100GB
    Row store (compressed): 50GB
    Column store: 5-10GB

14.4 Vectorized Execution

Column stores enable vectorized processing—operating on batches of values:

                    SCALAR VS VECTORIZED

    Scalar (row-at-a-time):
    for each row:
        if row.age > 30:
            output row

    → Function call overhead per row
    → Poor CPU cache utilization
    → Branch mispredictions

    Vectorized (column-at-a-time):
    age_vector = load 1000 ages
    mask = age_vector > 30       # SIMD comparison
    output = apply mask          # Batch filtering

    → Process 1000 values with one CPU instruction
    → Excellent cache utilization
    → No branches, predictable execution

SIMD Operations

Modern CPUs can process multiple values simultaneously:

    AVX-512: Process 16 32-bit integers at once

    Traditional:
    for i in 0..16:
        result[i] = a[i] + b[i]
    → 16 add instructions

    SIMD:
    result = _mm512_add_epi32(a, b)
    → 1 instruction for 16 adds!

14.5 Late Materialization

Column stores delay row reconstruction:

    Query: SELECT name, email FROM users WHERE age > 30 AND city = 'NYC'

    Early Materialization (reconstruct rows first):
    1. Scan age column, get matching row IDs: {1, 3, 5, 7, ...}
    2. For each ID, reconstruct full row
    3. Filter by city
    4. Extract name, email

    Late Materialization (defer reconstruction):
    1. Scan age column, get positions: {1, 3, 5, 7, ...}
    2. Scan city column at those positions only
    3. Filter: positions {1, 3, ...} remain
    4. Only now fetch name, email for final positions

    Benefits:
    - Skip columns that get filtered out
    - Less data movement
    - Better cache utilization

14.6 Column Store Architectures

Pure Column Stores

Store nothing but columns:

    Examples: ClickHouse, DuckDB, Vertica

    Each column in separate file
    Row reconstruction via position matching

Hybrid Row/Column

Store data in columnar format within row-oriented pages:

    PAX (Partition Attributes Across):
    ┌─────────────────────────────────────────────────────────────┐
    │ Page contains N rows, stored column-by-column within page   │
    │                                                              │
    │ [all ids] [all names] [all emails] [all ages] [all cities] │
    │     │          │          │           │           │         │
    │   Row 1-N    Row 1-N    Row 1-N     Row 1-N     Row 1-N    │
    └─────────────────────────────────────────────────────────────┘

    Benefits:
    - Column-oriented within page (good compression, vectorization)
    - Row locality across pages (good for reconstruction)

PostgreSQL Column Extensions

-- cstore_fdw (now Citus columnar)
CREATE TABLE events (
    event_id bigint,
    event_type text,
    data jsonb,
    created_at timestamp
) USING columnar;

-- Benefits: Compression, column pruning
-- Trade-off: Slower updates, best for append-only

14.7 OLTP vs OLAP

The fundamental difference in workload patterns:

                    OLTP vs OLAP

    OLTP (Online Transaction Processing):
    ┌─────────────────────────────────────────────────────────────┐
    │ • Many small transactions                                    │
    │ • Point queries: SELECT * WHERE id = ?                      │
    │ • Single-row INSERT, UPDATE, DELETE                         │
    │ • Low latency critical                                       │
    │ • Concurrent users                                           │
    │                                                              │
    │ Best: Row stores (PostgreSQL, MySQL, Oracle)                │
    └─────────────────────────────────────────────────────────────┘

    OLAP (Online Analytical Processing):
    ┌─────────────────────────────────────────────────────────────┐
    │ • Few large queries                                          │
    │ • Full scans: SELECT AVG(x) FROM big_table                  │
    │ • Read-mostly, batch loads                                   │
    │ • Throughput critical                                        │
    │ • Few concurrent queries                                     │
    │                                                              │
    │ Best: Column stores (ClickHouse, DuckDB, Snowflake)         │
    └─────────────────────────────────────────────────────────────┘

14.8 Column Store Examples

ClickHouse

Open-source column store for real-time analytics:

CREATE TABLE events (
    event_date Date,
    event_type String,
    user_id UInt64,
    data String
) ENGINE = MergeTree()
ORDER BY (event_date, event_type);

-- Query billions of rows in seconds
SELECT
    event_type,
    count() as cnt
FROM events
WHERE event_date >= '2024-01-01'
GROUP BY event_type
ORDER BY cnt DESC;

DuckDB

Embedded analytical database:

import duckdb

# Analyze Parquet files directly
result = duckdb.sql("""
    SELECT
        product_category,
        SUM(amount) as total
    FROM 'sales/*.parquet'
    GROUP BY product_category
""").fetchall()

# Or query pandas DataFrames
import pandas as pd
df = pd.read_csv('large_file.csv')
duckdb.sql("SELECT AVG(value) FROM df WHERE category = 'A'")

Parquet File Format

Columnar storage format for data lakes:

    Parquet File Structure:
    ┌─────────────────────────────────────────────────────────────┐
    │ Row Group 1                                                  │
    │   Column Chunk: id (compressed)                             │
    │   Column Chunk: name (compressed)                           │
    │   Column Chunk: email (compressed)                          │
    │                                                              │
    │ Row Group 2                                                  │
    │   Column Chunk: id (compressed)                             │
    │   Column Chunk: name (compressed)                           │
    │   Column Chunk: email (compressed)                          │
    │                                                              │
    │ Footer: Schema, column statistics, offsets                  │
    └─────────────────────────────────────────────────────────────┘

    Benefits:
    - Column pruning (skip columns)
    - Row group pruning (skip row groups via statistics)
    - Excellent compression
    - Language-independent format

14.9 Hybrid Architectures

HTAP (Hybrid Transactional/Analytical Processing)

Single system for both workloads:

                    HTAP ARCHITECTURE

    ┌─────────────────────────────────────────────────────────────┐
    │                     HTAP Database                            │
    │                                                              │
    │  ┌──────────────────────┐  ┌──────────────────────┐        │
    │  │   Row Store          │  │   Column Store        │        │
    │  │   (OLTP Engine)      │  │   (OLAP Engine)       │        │
    │  │                      │  │                       │        │
    │  │  Transactions        │  │  Analytics            │        │
    │  │  Point queries       │  │  Aggregations         │        │
    │  │  Updates             │  │  Scans                │        │
    │  └──────────┬───────────┘  └───────────┬──────────┘        │
    │             │                          │                     │
    │             └──────────┬───────────────┘                     │
    │                        │                                     │
    │              Sync/Replication                                │
    │                                                              │
    │  Examples: TiDB, SingleStore, Oracle Dual Format            │
    └─────────────────────────────────────────────────────────────┘

PostgreSQL + Analytics

-- Use foreign data wrapper to columnar store
CREATE EXTENSION clickhouse_fdw;

CREATE SERVER clickhouse_server
    FOREIGN DATA WRAPPER clickhouse_fdw
    OPTIONS (host 'analytics-server');

CREATE FOREIGN TABLE events_analytical (...)
    SERVER clickhouse_server;

-- OLTP in PostgreSQL, OLAP in ClickHouse
-- Sync via logical replication

14.10 When to Choose What

Choose Row Store When:

    ✓ OLTP workload
    ✓ Point queries by primary key
    ✓ Full-row access common
    ✓ Many small transactions
    ✓ Need strong ACID guarantees
    ✓ Complex transactions with updates

    Examples: User data, orders, inventory

Choose Column Store When:

    ✓ OLAP workload
    ✓ Aggregate queries (SUM, AVG, COUNT)
    ✓ Accessing few columns from wide tables
    ✓ Large sequential scans
    ✓ Append-mostly data
    ✓ Historical/time-series data

    Examples: Event logs, metrics, analytics

Consider Hybrid When:

    ✓ Need both OLTP and OLAP
    ✓ Real-time analytics on transactional data
    ✓ Can tolerate some complexity
    ✓ Single source of truth required

14.11 Performance Comparison

    Benchmark: 1 billion rows, 100 columns

    Query: SELECT AVG(col1) FROM table

    Row Store:    ~300 seconds
    Column Store: ~3 seconds
    Improvement:  100x

    Query: SELECT * FROM table WHERE id = 12345

    Row Store:    ~1 millisecond (with index)
    Column Store: ~100 milliseconds
    Degradation:  100x

    The right tool for the right job!

14.12 Summary

Column stores and row stores serve different needs:

AspectRow StoreColumn Store
StorageRow by rowColumn by column
Best forOLTPOLAP
Point queriesFastSlow
Full scansSlowFast
CompressionModerateExcellent
Write speedFastSlower
Update speedFastSlow

Key takeaways:

  • Row stores excel at transactional workloads
  • Column stores excel at analytical workloads
  • Compression is much better in column stores
  • Vectorized execution accelerates column processing
  • Hybrid architectures try to provide both

What’s Next

In Chapter 15, we’ll explore distributed databases and replication—how databases scale beyond a single machine.


“A column store and a row store looking at the same data see entirely different things. Choose the perspective that matches your questions.”