Chapter 1: Introduction - The Journey of a Query
“To understand how databases work, follow the data.”
When you type a SQL query and press Enter, a remarkable sequence of events unfolds. Your simple SELECT * FROM users WHERE id = 42 triggers a cascade of parsing, planning, optimization, and execution—all within milliseconds. Understanding this journey is the key to understanding databases.
Let’s trace exactly what happens.
1.1 A Simple Query’s Not-So-Simple Journey
Consider this query:
SELECT name, email FROM users WHERE id = 42;
Here’s what happens, step by step:
Step 1: The Query Arrives
Your application sends the SQL text over a network connection (or local socket) to the database server. The database’s connection handler receives the text and assigns it to a worker process or thread.
┌──────────────────┐ ┌──────────────────┐
│ Application │ SQL │ Database │
│ │ ──────► │ Server │
│ query("SELECT.. │ │ │
└──────────────────┘ └──────────────────┘
Step 2: Parsing
The parser takes the raw SQL text and converts it into a structured representation—an Abstract Syntax Tree (AST). This involves:
- Lexical analysis: Breaking the text into tokens (
SELECT,name,,,email,FROM, etc.) - Syntax analysis: Checking that tokens form valid SQL grammar
- Building the AST: Creating a tree structure representing the query
SELECT Statement
│
┌─────────┴─────────┐
│ │
Columns FROM
┌──┴──┐ │
name email users
│
WHERE
│
id = 42
If you write SELEKT instead of SELECT, this is where the error is caught.
Step 3: Semantic Analysis
The analyzer (or “binder”) verifies that the query makes sense:
- Does the
userstable exist? - Does it have
name,email, andidcolumns? - Is
id = 42a valid comparison (correct types)?
It also resolves any ambiguity (which id if there are multiple tables?) and expands wildcards (SELECT * becomes SELECT col1, col2, col3...).
Step 4: Query Optimization
Now comes the magic. The optimizer takes the logical query and figures out the best way to execute it. For our simple query, it might consider:
- Full table scan: Read every row in
users, check ifid = 42 - Index lookup: If there’s an index on
id, use it to jump directly to row 42
The optimizer uses statistics about the data (how many rows? how many distinct values?) to estimate the cost of each approach and picks the cheapest one.
OPTIMIZER DECISION
Option A: Full Table Scan Option B: Index Lookup
┌─────────────────────────┐ ┌─────────────────────────┐
│ Read all 1,000,000 rows │ │ Look up id=42 in index │
│ Cost: ~1,000,000 I/Os │ │ Read 1 row from table │
│ │ │ Cost: ~3 I/Os │
└─────────────────────────┘ └─────────────────────────┘
│
▼
WINNER!
The output is an execution plan—a recipe for how to retrieve the data.
Step 5: Execution
The executor runs the plan. For an index lookup:
- Navigate the B-tree index on
idto find the entry for42 - Follow the pointer to the actual row in the data file
- Read the row from disk (or buffer pool if cached)
- Extract the
nameandemailcolumns - Return the result to the client
But wait—what if that data page isn’t in memory?
Step 6: Buffer Pool Interaction
The executor doesn’t read directly from disk. Instead, it asks the buffer pool (also called buffer cache or page cache) for the needed pages. The buffer pool:
- Checks if the page is already in memory
- If yes, returns it immediately
- If no, reads it from disk, caches it, then returns it
┌─────────────────────────────────────────────────────────┐
│ BUFFER POOL │
│ ┌──────┐ ┌──────┐ ┌──────┐ ┌──────┐ ┌──────┐ │
│ │Page 1│ │Page 7│ │Page 3│ │Page 42│ │Page 9│ ... │
│ └──────┘ └──────┘ └──────┘ └──────┘ └──────┘ │
│ ▲ │
│ │ HIT! │
│ Request: "I need page 42" │
└─────────────────────────────────────────────────────────┘
Step 7: Return Results
Finally, the result row is sent back to your application. The entire process might take under a millisecond if data is cached, or a few milliseconds if disk I/O is needed.
1.2 Why This Matters
Understanding this journey helps you:
Debug slow queries: Is the optimizer picking a bad plan? Is the buffer pool too small? Is there no suitable index?
Design better schemas: Knowing how indexes work helps you create the right ones.
Tune performance: Understanding the buffer pool helps you allocate memory effectively.
Choose the right database: Different databases make different trade-offs at each step.
1.3 The Write Path: What Happens on INSERT?
Reading is only half the story. Let’s trace an INSERT:
INSERT INTO users (name, email) VALUES ('Alice', 'alice@example.com');
Step 1: Same Parsing and Planning
The query goes through parsing and planning, just like SELECT.
Step 2: Transaction Start
If not already in a transaction, an implicit one begins. The database assigns a transaction ID for tracking.
Step 3: Write-Ahead Log (WAL)
Before modifying any data, the database writes a log record describing the change to the Write-Ahead Log. This is crucial for crash recovery.
┌─────────────────────────────────────────────────────────┐
│ WAL (on disk) │
│ ┌─────────────────────────────────────────────────────┐│
│ │ LSN: 1001 | TXN: 42 | INSERT users | ('Alice',...) ││
│ └─────────────────────────────────────────────────────┘│
│ ▲ │
│ │ │
│ Write log FIRST │
└─────────────────────────────────────────────────────────┘
Step 4: Modify Data Page
Now the actual data page is modified—but only in the buffer pool (memory). The page becomes “dirty” (modified but not yet written to disk).
┌─────────────────────────────────────────────────────────┐
│ BUFFER POOL │
│ ┌──────────────────────────────────────┐ │
│ │ Page 7 (users table) │ ← DIRTY │
│ │ ┌────────────────────────────────┐ │ │
│ │ │ ... existing rows ... │ │ │
│ │ │ NEW: Alice, alice@example.com │ │ │
│ │ └────────────────────────────────┘ │ │
│ └──────────────────────────────────────┘ │
└─────────────────────────────────────────────────────────┘
Step 5: Update Indexes
If there are indexes on users, they must be updated too. Each index gets a new entry pointing to the new row.
Step 6: Commit
When the transaction commits:
- A commit record is written to the WAL
- The WAL is flushed to disk (this is the durability guarantee!)
- The transaction is now permanent
Note: The actual data pages may still be dirty in memory. They’ll be written to disk later by a background process. But because the WAL has the changes, we can recover them after a crash.
1.4 The Key Abstractions
Throughout this book, we’ll explore several key abstractions that make databases work:
Pages
The fundamental unit of storage. Everything—data, indexes, metadata—is stored in fixed-size pages (typically 4KB, 8KB, or 16KB).
┌─────────────────────────────────────────────┐
│ PAGE (8KB typical) │
├─────────────────────────────────────────────┤
│ Page Header │
│ - Page ID │
│ - Free space pointer │
│ - Checksum │
├─────────────────────────────────────────────┤
│ │
│ Row 1: id=1, name='Bob', email='...' │
│ Row 2: id=2, name='Carol', email='...' │
│ Row 3: id=3, name='Dave', email='...' │
│ ... │
│ │
├─────────────────────────────────────────────┤
│ [Free Space] │
└─────────────────────────────────────────────┘
Indexes
Data structures (usually B-trees) that speed up lookups. An index on id lets you find id=42 without scanning all rows.
Buffer Pool
The memory cache for pages. Most database memory is dedicated to keeping frequently-used pages in RAM.
Write-Ahead Log
An append-only file recording all changes. Enables crash recovery and replication.
Transactions
A way to group operations so they all succeed or all fail together. Provides ACID guarantees.
Locks
Mechanisms to prevent concurrent transactions from interfering with each other.
1.5 The Storage Engine
At the heart of every database is the storage engine—the component responsible for storing, retrieving, and managing data on disk. Some databases have a single storage engine; others let you choose.
Examples:
- PostgreSQL: Has one integrated storage engine (with pluggable extensions)
- MySQL: Pluggable engines—InnoDB (default), MyISAM, Memory, etc.
- SQLite: Single embedded engine optimized for simplicity
- MongoDB: WiredTiger (default), previously MMAPv1
The storage engine handles:
- Data file format and organization
- Index structures and maintenance
- Buffer pool management
- Crash recovery
- Concurrency control
Everything else—SQL parsing, query optimization, networking—sits above the storage engine.
┌───────────────────────────────────────────────────────────┐
│ SQL Layer │
│ Parser │ Optimizer │ Executor │ Network │
├───────────────────────────────────────────────────────────┤
│ Storage Engine │
│ Buffer Pool │ B-trees │ WAL │ Transactions │ Recovery │
├───────────────────────────────────────────────────────────┤
│ Operating System │
│ File System │ Disk I/O │ Memory │
├───────────────────────────────────────────────────────────┤
│ Hardware │
│ SSD/HDD │ RAM │ CPU │
└───────────────────────────────────────────────────────────┘
1.6 What We’ll Cover
This book explores each layer of this stack in detail:
Part I (Chapters 1-3): Foundations—how data is physically stored and managed
Part II (Chapters 4-6): Indexing—the data structures that make queries fast
Part III (Chapters 7-9): Transactions—how databases handle concurrency and guarantee durability
Part IV (Chapters 10-12): Query Processing—how SQL becomes execution plans
Part V (Chapters 13-15): Reliability and Scale—crash recovery, storage architectures, and distributed systems
1.7 A Note on Examples
Throughout this book, we’ll use PostgreSQL as our primary reference implementation. It’s open source, well-documented, and represents a mature, full-featured relational database. However, the concepts apply broadly:
- B-trees work the same way in PostgreSQL, MySQL, Oracle, and SQL Server
- WAL (Write-Ahead Logging) is used by virtually all modern databases
- MVCC principles are similar across implementations (though details differ)
- Buffer pool management uses similar algorithms everywhere
Where implementations differ significantly, we’ll note it.
Summary
- A SQL query goes through parsing, optimization, and execution
- The optimizer chooses the best execution plan based on statistics
- The buffer pool caches pages to avoid disk I/O
- Writes go to the WAL first, then to data pages
- The storage engine is the heart of the database
- Understanding this flow helps you debug and optimize
What’s Next
In Chapter 2, we’ll dive deep into storage engines and file formats—how databases actually organize bits on disk. We’ll see why page size matters, how rows are packed into pages, and the trade-offs different formats make.
“The query optimizer is the second-best place to put smarts in a database. The first is in the schema design.”