Chapter 11: Query Optimization

“The optimizer’s job is to find a needle in a haystack of possible plans—then realize it should be looking for a different needle.”

A single query can be executed in thousands of different ways. The optimizer’s job is to find a good one—ideally the best, but at least not terrible. This is one of the most complex components in any database system.

11.1 Why Optimization Matters

Consider a simple join:

SELECT * FROM orders o
JOIN customers c ON o.customer_id = c.id
WHERE c.country = 'USA';

Plan A: Scan orders first

For each of 10,000,000 orders:
    Look up customer
    Check if country = 'USA'
Cost: ~10,000,000 index lookups

Plan B: Filter customers first

Find customers where country = 'USA' (100,000 customers)
For each, find their orders via index
Cost: ~100,000 index lookups + filtered scan

Plan B is 100x faster. The optimizer’s job is to find it.

11.2 The Optimization Process

                    QUERY OPTIMIZATION PHASES

    ┌─────────────────────────────────────────────────────────────┐
    │ 1. GENERATE CANDIDATE PLANS                                  │
    │    - Different join orders                                   │
    │    - Different access methods (seq scan, index scan)        │
    │    - Different join methods (nested loop, hash, merge)      │
    └─────────────────────────────────────────────────────────────┘
                               │
                               ▼
    ┌─────────────────────────────────────────────────────────────┐
    │ 2. ESTIMATE COSTS                                            │
    │    - Use statistics about data                               │
    │    - Model I/O, CPU, memory costs                           │
    │    - Estimate result sizes (cardinality)                    │
    └─────────────────────────────────────────────────────────────┘
                               │
                               ▼
    ┌─────────────────────────────────────────────────────────────┐
    │ 3. SELECT BEST PLAN                                          │
    │    - Compare total costs                                     │
    │    - Return cheapest plan                                    │
    └─────────────────────────────────────────────────────────────┘

11.3 The Plan Space Explosion

For a query joining N tables, there are:

N! different join orders
Multiple access methods per table
Multiple join algorithms per join

    3 tables: 3! = 6 join orders
    5 tables: 5! = 120 join orders
    10 tables: 10! = 3,628,800 join orders

    Plus access methods and join types!

Exhaustive search is impossible for large queries. Optimizers use heuristics and dynamic programming.

11.4 Cost Estimation

The Cost Model

Optimizers estimate cost based on:

    Total Cost = CPU Cost + I/O Cost + Memory Cost

    I/O Cost:
    - Sequential page reads (cheap)
    - Random page reads (expensive)
    - Index accesses

    CPU Cost:
    - Tuple processing
    - Expression evaluation
    - Sorting overhead

    Memory Cost:
    - Hash table building
    - Sort buffers

Cost Parameters (PostgreSQL)

-- View current cost parameters
SHOW seq_page_cost;        -- 1.0 (baseline)
SHOW random_page_cost;     -- 4.0 (4x sequential)
SHOW cpu_tuple_cost;       -- 0.01
SHOW cpu_index_tuple_cost; -- 0.005
SHOW cpu_operator_cost;    -- 0.0025

-- Adjust for SSDs (random I/O is faster)
SET random_page_cost = 1.1;

Example Cost Calculation

    Seq Scan on users (10,000 rows, 100 pages):
    Cost = (100 pages × seq_page_cost) + (10,000 rows × cpu_tuple_cost)
    Cost = (100 × 1.0) + (10,000 × 0.01) = 200

    Index Scan (estimate 100 rows, 100 index + heap accesses):
    Cost = (100 × random_page_cost) + (100 × cpu_index_tuple_cost)
    Cost = (100 × 4.0) + (100 × 0.005) = 400.5

    Sequential scan wins! (Until we add WHERE clause selectivity)

11.5 Statistics and Cardinality Estimation

What Statistics Are Stored

Databases maintain statistics about each column:

-- PostgreSQL: View statistics
SELECT
    attname,
    n_distinct,     -- Number of distinct values
    null_frac,      -- Fraction of NULLs
    avg_width,      -- Average value width in bytes
    correlation     -- Physical vs logical ordering
FROM pg_stats
WHERE tablename = 'users';

Histograms

For non-uniform distributions, histograms capture value frequencies:

    Column: age
    Histogram buckets:

    [0-18]:  500 rows   (5%)
    [19-25]: 2000 rows  (20%)
    [26-35]: 3500 rows  (35%)
    [36-50]: 2500 rows  (25%)
    [51+]:   1500 rows  (15%)

    Query: WHERE age > 35
    Estimated selectivity: 25% + 15% = 40%
    Estimated rows: 10,000 × 0.40 = 4,000

Most Common Values (MCV)

For skewed distributions, store the most common values:

    Column: country
    Most Common Values:
    USA:     4000 (40%)
    UK:      1500 (15%)
    Canada:  1000 (10%)
    Germany: 800  (8%)
    Other:   2700 (27%)

    Query: WHERE country = 'USA'
    Estimated rows: 4000 (exact from MCV!)

    Query: WHERE country = 'France'
    Not in MCV, estimate from remaining: 2700 / (distinct - 4) ≈ 270

Updating Statistics

-- PostgreSQL: Manual statistics update
ANALYZE users;
ANALYZE users(email);  -- Specific column

-- View when last analyzed
SELECT relname, last_analyze, last_autoanalyze
FROM pg_stat_user_tables;

Autovacuum updates statistics automatically, but manual ANALYZE may be needed after bulk loads.

11.6 Join Optimization

Join Order Selection

For a join of tables A, B, C, possible orders include:

(A ⋈ B) ⋈ C
(A ⋈ C) ⋈ B
(B ⋈ C) ⋈ A
(B ⋈ A) ⋈ C
etc.

The optimizer uses dynamic programming to find the best order:

    1. Find best way to access each single table
    2. Find best way to join each pair of tables
    3. Find best way to join each triple (using best pairs)
    4. Continue until all tables are joined

Join Methods

Nested Loop Join

For each row in outer:
    For each row in inner:
        If join condition matches:
            Output combined row

Best when: Outer is small, inner has index
Cost: O(N × M) without index, O(N × log M) with index

Hash Join

Build hash table from smaller relation
For each row in larger relation:
    Probe hash table
    Output matches

Best when: No useful indexes, one relation fits in memory
Cost: O(N + M) if hash table fits in memory

Merge Join

Sort both relations by join key
Merge like merging sorted lists

Best when: Both relations already sorted, or sort is cheap
Cost: O(N log N + M log M) for sorting + O(N + M) for merge

                    JOIN METHOD COMPARISON

    Nested Loop       Hash Join         Merge Join
    ┌─────────┐      ┌─────────┐       ┌─────────┐
    │ Outer   │      │  Hash   │       │ Sorted  │
    │  ↓      │      │  Table  │       │    A    │
    │ For each│      │ ┌─┬─┬─┐ │       │   ↓↓↓   │
    │  row:   │      │ │ │ │ │ │       │  Merge  │
    │ ┌─────┐ │      │ └─┴─┴─┘ │       │   ↓↓↓   │
    │ │Index│ │      │    ▲    │       │ Sorted  │
    │ │Scan │ │      │    │    │       │    B    │
    │ └─────┘ │      │ Probe   │       │         │
    └─────────┘      └─────────┘       └─────────┘

11.7 Access Path Selection

For each table, choose how to access it:

Sequential Scan

Read all pages, filter in memory.

Best when:
- Need most rows from table
- No useful index
- Table is small

Index Scan

Use index to find matching rows, fetch from heap.

Best when:
- High selectivity (few rows match)
- Index exists on filter column
- Results need to be ordered by indexed column

Index-Only Scan

Get all needed data from index, skip heap.

Best when:
- All needed columns are in the index
- Visibility map shows pages are all-visible

Bitmap Index Scan

Build bitmap from index, then scan heap in physical order.

Best when:
- Medium selectivity
- Multiple conditions can use different indexes
- Random I/O from regular index scan would be expensive

                    ACCESS PATH DECISION TREE

    ┌─────────────────────────────────────┐
    │ What percentage of rows needed?     │
    └─────────────────────────────────────┘
                    │
        ┌───────────┼───────────┐
        │           │           │
      >20%        5-20%       <5%
        │           │           │
        ▼           ▼           ▼
    Seq Scan    Bitmap Scan   Index Scan

11.8 Predicate Pushdown

Move filters as close to data as possible:

-- Original
SELECT * FROM (
    SELECT * FROM orders JOIN customers ON orders.cust_id = customers.id
) sub
WHERE customers.country = 'USA';

-- After pushdown
SELECT * FROM orders
JOIN (SELECT * FROM customers WHERE country = 'USA') c
ON orders.cust_id = c.id;

Benefit: Filter 90% of customers before the join instead of after.

11.9 Join Reordering

Optimizer chooses the best join order:

SELECT * FROM A, B, C WHERE A.x = B.x AND B.y = C.y;

-- Possible orders:
-- 1. (A ⋈ B) ⋈ C
-- 2. (B ⋈ C) ⋈ A
-- 3. (A ⋈ C) ⋈ B  -- Cartesian product, usually bad!

-- If A has 1000 rows, B has 100 rows, C has 10 rows:
-- Order 2 is best: (B ⋈ C) produces small result, then join A

Controlling Join Order

-- PostgreSQL: Disable join reordering
SET join_collapse_limit = 1;

-- Force specific order with explicit JOIN
SELECT * FROM A
JOIN B ON A.x = B.x
JOIN C ON B.y = C.y;

-- vs implicit join (optimizer chooses order)
SELECT * FROM A, B, C WHERE A.x = B.x AND B.y = C.y;

11.10 Common Optimization Techniques

Constant Folding

Evaluate constant expressions at plan time:

-- Before
SELECT * FROM orders WHERE date > '2024-01-01'::date + interval '30 days';

-- After constant folding
SELECT * FROM orders WHERE date > '2024-01-31';

Predicate Simplification

-- Before
SELECT * FROM users WHERE age > 10 AND age > 20;

-- After
SELECT * FROM users WHERE age > 20;

IN to JOIN Transformation

-- Before
SELECT * FROM orders WHERE customer_id IN (
    SELECT id FROM customers WHERE country = 'USA'
);

-- After
SELECT orders.* FROM orders
JOIN customers ON orders.customer_id = customers.id
WHERE customers.country = 'USA';

EXISTS to Semi-Join

-- Before
SELECT * FROM departments d
WHERE EXISTS (SELECT 1 FROM employees e WHERE e.dept_id = d.id);

-- After (semi-join)
SELECT d.* FROM departments d SEMI JOIN employees e ON e.dept_id = d.id;

11.11 When the Optimizer Gets It Wrong

Outdated Statistics

-- Just loaded 1M rows, stats say table has 1000 rows
ANALYZE my_table;  -- Fix it

Correlation Not Captured

-- city='NYC' AND state='NY' are correlated
-- Optimizer assumes independence: P(NYC) × P(NY)
-- Reality: P(NYC AND NY) ≈ P(NYC)

-- PostgreSQL 10+: Extended statistics
CREATE STATISTICS stats_city_state ON city, state FROM addresses;
ANALYZE addresses;

Parameter Sniffing (Prepared Statements)

-- First call: WHERE id = 1 (1 row, use index)
-- Plan cached
-- Later call: WHERE id = NULL (matches many, should seq scan)
-- But cached plan uses index scan!

Function Calls Hide Information

-- Optimizer can't see inside function
SELECT * FROM users WHERE my_function(email) = 'something';
-- Can't use index on email!

11.12 Influencing the Optimizer

Hints (MySQL, Oracle)

-- MySQL
SELECT /*+ INDEX(users idx_age) */ * FROM users WHERE age > 30;
SELECT /*+ NO_INDEX(users) */ * FROM users WHERE age > 30;

-- Oracle
SELECT /*+ FULL(users) */ * FROM users WHERE age > 30;
SELECT /*+ USE_HASH(orders customers) */ * FROM orders JOIN customers ...;

PostgreSQL: No Hints, But…

-- Disable specific plan types
SET enable_seqscan = off;        -- Discourage seq scans
SET enable_hashjoin = off;       -- Discourage hash joins
SET enable_mergejoin = off;      -- Discourage merge joins

-- After running query, re-enable
RESET enable_seqscan;

Better: Fix the Root Cause

Update statistics: ANALYZE table
Create appropriate indexes
Create extended statistics for correlations
Rewrite the query
Adjust cost parameters for your hardware

11.13 Analyzing Query Plans

What to Look For

EXPLAIN (ANALYZE, BUFFERS, FORMAT TEXT)
SELECT * FROM orders WHERE customer_id = 42;

Index Scan using idx_orders_customer on orders
    (cost=0.43..8.45 rows=1 width=48)
    (actual time=0.019..0.021 rows=5 loops=1)
  Index Cond: (customer_id = 42)
  Buffers: shared hit=4
Planning Time: 0.065 ms
Execution Time: 0.035 ms

Red flags to watch for:

rows=1 estimated vs rows=1000000 actual (bad estimate!)
Seq Scan on large table with selective filter
Nested Loop with large inner table
Sort with external merge (spilling to disk)
Very high Buffers: shared read (cache misses)

Comparing Estimated vs Actual

    Estimated rows=100, Actual rows=100000

    The optimizer thought this would return 100 rows.
    It actually returned 100,000.

    Every downstream operation is based on wrong estimate!
    Likely cause: Outdated statistics or unrecognized correlation

11.14 Summary

Query optimization finds efficient execution plans:

Cost model estimates I/O, CPU, and memory costs
Statistics provide data distribution information
Cardinality estimation predicts result sizes
Join optimization chooses order and method
Access path selection chooses how to read tables

The optimizer can be wrong due to:

Outdated or missing statistics
Unrecognized correlations
Parameter sniffing
Complex expressions

Understanding optimization helps you:

Write optimizer-friendly queries
Diagnose slow queries
Create appropriate indexes
Know when to intervene

What’s Next

In Chapter 12, we’ll explore buffer pools and caching—how databases keep frequently-used data in memory.

“The optimizer is an oracle that predicts the future based on the past. When the past is wrong, so is the future.”

Keyboard shortcuts

Database Internals: Where Your Data Actually Lives