Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

Chapter 1: Introduction - The Journey of a Query

“To understand how databases work, follow the data.”

When you type a SQL query and press Enter, a remarkable sequence of events unfolds. Your simple SELECT * FROM users WHERE id = 42 triggers a cascade of parsing, planning, optimization, and execution—all within milliseconds. Understanding this journey is the key to understanding databases.

Let’s trace exactly what happens.


1.1 A Simple Query’s Not-So-Simple Journey

Consider this query:

SELECT name, email FROM users WHERE id = 42;

Here’s what happens, step by step:

Step 1: The Query Arrives

Your application sends the SQL text over a network connection (or local socket) to the database server. The database’s connection handler receives the text and assigns it to a worker process or thread.

┌──────────────────┐         ┌──────────────────┐
│   Application    │   SQL   │    Database      │
│                  │ ──────► │    Server        │
│  query("SELECT.. │         │                  │
└──────────────────┘         └──────────────────┘

Step 2: Parsing

The parser takes the raw SQL text and converts it into a structured representation—an Abstract Syntax Tree (AST). This involves:

  1. Lexical analysis: Breaking the text into tokens (SELECT, name, ,, email, FROM, etc.)
  2. Syntax analysis: Checking that tokens form valid SQL grammar
  3. Building the AST: Creating a tree structure representing the query
         SELECT Statement
              │
    ┌─────────┴─────────┐
    │                   │
 Columns              FROM
 ┌──┴──┐                │
name  email          users
                        │
                     WHERE
                        │
                   id = 42

If you write SELEKT instead of SELECT, this is where the error is caught.

Step 3: Semantic Analysis

The analyzer (or “binder”) verifies that the query makes sense:

  • Does the users table exist?
  • Does it have name, email, and id columns?
  • Is id = 42 a valid comparison (correct types)?

It also resolves any ambiguity (which id if there are multiple tables?) and expands wildcards (SELECT * becomes SELECT col1, col2, col3...).

Step 4: Query Optimization

Now comes the magic. The optimizer takes the logical query and figures out the best way to execute it. For our simple query, it might consider:

  • Full table scan: Read every row in users, check if id = 42
  • Index lookup: If there’s an index on id, use it to jump directly to row 42

The optimizer uses statistics about the data (how many rows? how many distinct values?) to estimate the cost of each approach and picks the cheapest one.

                    OPTIMIZER DECISION

    Option A: Full Table Scan          Option B: Index Lookup
    ┌─────────────────────────┐        ┌─────────────────────────┐
    │ Read all 1,000,000 rows │        │ Look up id=42 in index  │
    │ Cost: ~1,000,000 I/Os   │        │ Read 1 row from table   │
    │                         │        │ Cost: ~3 I/Os           │
    └─────────────────────────┘        └─────────────────────────┘
                                                  │
                                                  ▼
                                            WINNER!

The output is an execution plan—a recipe for how to retrieve the data.

Step 5: Execution

The executor runs the plan. For an index lookup:

  1. Navigate the B-tree index on id to find the entry for 42
  2. Follow the pointer to the actual row in the data file
  3. Read the row from disk (or buffer pool if cached)
  4. Extract the name and email columns
  5. Return the result to the client

But wait—what if that data page isn’t in memory?

Step 6: Buffer Pool Interaction

The executor doesn’t read directly from disk. Instead, it asks the buffer pool (also called buffer cache or page cache) for the needed pages. The buffer pool:

  1. Checks if the page is already in memory
  2. If yes, returns it immediately
  3. If no, reads it from disk, caches it, then returns it
┌─────────────────────────────────────────────────────────┐
│                     BUFFER POOL                          │
│  ┌──────┐ ┌──────┐ ┌──────┐ ┌──────┐ ┌──────┐          │
│  │Page 1│ │Page 7│ │Page 3│ │Page 42│ │Page 9│  ...     │
│  └──────┘ └──────┘ └──────┘ └──────┘ └──────┘          │
│                         ▲                                │
│                         │ HIT!                           │
│          Request: "I need page 42"                       │
└─────────────────────────────────────────────────────────┘

Step 7: Return Results

Finally, the result row is sent back to your application. The entire process might take under a millisecond if data is cached, or a few milliseconds if disk I/O is needed.


1.2 Why This Matters

Understanding this journey helps you:

Debug slow queries: Is the optimizer picking a bad plan? Is the buffer pool too small? Is there no suitable index?

Design better schemas: Knowing how indexes work helps you create the right ones.

Tune performance: Understanding the buffer pool helps you allocate memory effectively.

Choose the right database: Different databases make different trade-offs at each step.


1.3 The Write Path: What Happens on INSERT?

Reading is only half the story. Let’s trace an INSERT:

INSERT INTO users (name, email) VALUES ('Alice', 'alice@example.com');

Step 1: Same Parsing and Planning

The query goes through parsing and planning, just like SELECT.

Step 2: Transaction Start

If not already in a transaction, an implicit one begins. The database assigns a transaction ID for tracking.

Step 3: Write-Ahead Log (WAL)

Before modifying any data, the database writes a log record describing the change to the Write-Ahead Log. This is crucial for crash recovery.

┌─────────────────────────────────────────────────────────┐
│                    WAL (on disk)                         │
│  ┌─────────────────────────────────────────────────────┐│
│  │ LSN: 1001 | TXN: 42 | INSERT users | ('Alice',...)  ││
│  └─────────────────────────────────────────────────────┘│
│                         ▲                                │
│                         │                                │
│                  Write log FIRST                         │
└─────────────────────────────────────────────────────────┘

Step 4: Modify Data Page

Now the actual data page is modified—but only in the buffer pool (memory). The page becomes “dirty” (modified but not yet written to disk).

┌─────────────────────────────────────────────────────────┐
│                     BUFFER POOL                          │
│  ┌──────────────────────────────────────┐               │
│  │ Page 7 (users table)                 │ ← DIRTY       │
│  │ ┌────────────────────────────────┐   │               │
│  │ │ ... existing rows ...          │   │               │
│  │ │ NEW: Alice, alice@example.com  │   │               │
│  │ └────────────────────────────────┘   │               │
│  └──────────────────────────────────────┘               │
└─────────────────────────────────────────────────────────┘

Step 5: Update Indexes

If there are indexes on users, they must be updated too. Each index gets a new entry pointing to the new row.

Step 6: Commit

When the transaction commits:

  1. A commit record is written to the WAL
  2. The WAL is flushed to disk (this is the durability guarantee!)
  3. The transaction is now permanent

Note: The actual data pages may still be dirty in memory. They’ll be written to disk later by a background process. But because the WAL has the changes, we can recover them after a crash.


1.4 The Key Abstractions

Throughout this book, we’ll explore several key abstractions that make databases work:

Pages

The fundamental unit of storage. Everything—data, indexes, metadata—is stored in fixed-size pages (typically 4KB, 8KB, or 16KB).

┌─────────────────────────────────────────────┐
│              PAGE (8KB typical)              │
├─────────────────────────────────────────────┤
│ Page Header                                  │
│   - Page ID                                  │
│   - Free space pointer                       │
│   - Checksum                                 │
├─────────────────────────────────────────────┤
│                                             │
│ Row 1: id=1, name='Bob', email='...'        │
│ Row 2: id=2, name='Carol', email='...'      │
│ Row 3: id=3, name='Dave', email='...'       │
│ ...                                         │
│                                             │
├─────────────────────────────────────────────┤
│ [Free Space]                                │
└─────────────────────────────────────────────┘

Indexes

Data structures (usually B-trees) that speed up lookups. An index on id lets you find id=42 without scanning all rows.

Buffer Pool

The memory cache for pages. Most database memory is dedicated to keeping frequently-used pages in RAM.

Write-Ahead Log

An append-only file recording all changes. Enables crash recovery and replication.

Transactions

A way to group operations so they all succeed or all fail together. Provides ACID guarantees.

Locks

Mechanisms to prevent concurrent transactions from interfering with each other.


1.5 The Storage Engine

At the heart of every database is the storage engine—the component responsible for storing, retrieving, and managing data on disk. Some databases have a single storage engine; others let you choose.

Examples:

  • PostgreSQL: Has one integrated storage engine (with pluggable extensions)
  • MySQL: Pluggable engines—InnoDB (default), MyISAM, Memory, etc.
  • SQLite: Single embedded engine optimized for simplicity
  • MongoDB: WiredTiger (default), previously MMAPv1

The storage engine handles:

  • Data file format and organization
  • Index structures and maintenance
  • Buffer pool management
  • Crash recovery
  • Concurrency control

Everything else—SQL parsing, query optimization, networking—sits above the storage engine.

┌───────────────────────────────────────────────────────────┐
│                      SQL Layer                             │
│        Parser │ Optimizer │ Executor │ Network            │
├───────────────────────────────────────────────────────────┤
│                   Storage Engine                           │
│    Buffer Pool │ B-trees │ WAL │ Transactions │ Recovery  │
├───────────────────────────────────────────────────────────┤
│                   Operating System                         │
│              File System │ Disk I/O │ Memory              │
├───────────────────────────────────────────────────────────┤
│                      Hardware                              │
│              SSD/HDD │ RAM │ CPU                          │
└───────────────────────────────────────────────────────────┘

1.6 What We’ll Cover

This book explores each layer of this stack in detail:

Part I (Chapters 1-3): Foundations—how data is physically stored and managed

Part II (Chapters 4-6): Indexing—the data structures that make queries fast

Part III (Chapters 7-9): Transactions—how databases handle concurrency and guarantee durability

Part IV (Chapters 10-12): Query Processing—how SQL becomes execution plans

Part V (Chapters 13-15): Reliability and Scale—crash recovery, storage architectures, and distributed systems


1.7 A Note on Examples

Throughout this book, we’ll use PostgreSQL as our primary reference implementation. It’s open source, well-documented, and represents a mature, full-featured relational database. However, the concepts apply broadly:

  • B-trees work the same way in PostgreSQL, MySQL, Oracle, and SQL Server
  • WAL (Write-Ahead Logging) is used by virtually all modern databases
  • MVCC principles are similar across implementations (though details differ)
  • Buffer pool management uses similar algorithms everywhere

Where implementations differ significantly, we’ll note it.


Summary

  • A SQL query goes through parsing, optimization, and execution
  • The optimizer chooses the best execution plan based on statistics
  • The buffer pool caches pages to avoid disk I/O
  • Writes go to the WAL first, then to data pages
  • The storage engine is the heart of the database
  • Understanding this flow helps you debug and optimize

What’s Next

In Chapter 2, we’ll dive deep into storage engines and file formats—how databases actually organize bits on disk. We’ll see why page size matters, how rows are packed into pages, and the trade-offs different formats make.


“The query optimizer is the second-best place to put smarts in a database. The first is in the schema design.”