Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

Database Internals: Where Your Data Actually Lives

A CloudStreet Educational Book

Written by Opus 4.5


“A database is not just a collection of tables—it’s a carefully orchestrated symphony of data structures, algorithms, and system design decisions, all working together to answer your queries while keeping your data safe.”


Preface

Every day, billions of database transactions occur around the world. When you check your bank balance, post a photo, or search for a product, databases spring into action—reading pages from disk, navigating tree structures, acquiring locks, and committing transactions. Yet for most developers, databases remain mysterious black boxes. SQL goes in, data comes out, and everything in between is magic.

This book aims to demystify that magic.

We’ll explore how databases actually store your data on disk, how they find it quickly using sophisticated index structures, how they allow thousands of users to read and write simultaneously without corrupting anything, and how they recover gracefully when servers crash. Along the way, we’ll encounter some of the most elegant algorithms and data structures in computer science—concepts that have been refined over five decades of database research.

Understanding database internals isn’t just academic curiosity. This knowledge helps you:

  • Write better queries by understanding what the database actually does with them
  • Design better schemas by knowing how data layout affects performance
  • Debug performance issues by recognizing symptoms of common problems
  • Choose the right database by understanding the trade-offs different architectures make
  • Build better systems by learning from decades of distributed systems research

Whether you’re a backend developer, a data engineer, or simply someone who wants to understand how their data is managed, this book will give you the foundation to think about databases at a deeper level.

Let’s begin our journey into the engine room.


Part I: Foundations

Before we can understand the sophisticated algorithms that make databases fast, we need to understand the physical reality they’re built on. Databases don’t exist in abstract mathematical space—they run on real machines with spinning disks (or flash storage), limited memory, and CPUs that can only do one thing at a time. The fundamental challenge of database design is bridging the gap between the logical world of tables and queries and the physical world of bytes on disk.

┌─────────────────────────────────────────────────────────────────┐
│                        YOUR APPLICATION                          │
│                     SELECT * FROM users                          │
└─────────────────────────────────────────────────────────────────┘
                                │
                                ▼
┌─────────────────────────────────────────────────────────────────┐
│                        SQL INTERFACE                             │
│              Parser → Planner → Optimizer → Executor             │
└─────────────────────────────────────────────────────────────────┘
                                │
                                ▼
┌─────────────────────────────────────────────────────────────────┐
│                      STORAGE ENGINE                              │
│            Buffer Pool ← → Transaction Manager                   │
│                    ↓               ↓                             │
│              Index Manager    Lock Manager                       │
└─────────────────────────────────────────────────────────────────┘
                                │
                                ▼
┌─────────────────────────────────────────────────────────────────┐
│                         DISK                                     │
│     Data Files    │    Index Files    │    WAL Files             │
│   ┌──┬──┬──┬──┐   │   ┌──┬──┬──┬──┐   │   ┌──┬──┬──┬──┐         │
│   │▓▓│▓▓│░░│▓▓│   │   │▓▓│▓▓│▓▓│▓▓│   │   │▓▓│▓▓│▓▓│░░│         │
│   └──┴──┴──┴──┘   │   └──┴──┴──┴──┘   │   └──┴──┴──┴──┘         │
└─────────────────────────────────────────────────────────────────┘

In Part I, we’ll build up from the physical layer:

  • Chapter 1 introduces the journey of a query from SQL text to disk operations
  • Chapter 2 explores storage engines and how they organize data in files
  • Chapter 3 dives into the realities of disk I/O and page-based storage

Chapter 1: Introduction - The Journey of a Query

Chapter 2: Storage Engines and File Formats

Chapter 3: Disk I/O and Page Management


Part II: Data Structures

With an understanding of how data lives on disk, we can now explore the data structures that make finding that data efficient. A naive approach—scanning every record to find a match—works fine for ten records but becomes catastrophic at ten million. Index structures solve this problem, trading storage space and write overhead for dramatic read performance improvements.

                    WITHOUT INDEX                    WITH B-TREE INDEX

Query: WHERE id = 42                    Query: WHERE id = 42

┌──────────────────────┐                ┌──────────────────────┐
│ Scan ALL 1,000,000   │                │ Navigate 3-4 tree    │
│ records sequentially │                │ levels to find it    │
│                      │                │                      │
│ Time: O(n)           │                │ Time: O(log n)       │
│ = 1,000,000 reads    │                │ = ~20 reads          │
└──────────────────────┘                └──────────────────────┘

In Part II, we’ll explore the major index structures:

  • Chapter 4 covers B-trees and B+ trees—the workhorses of database indexing
  • Chapter 5 introduces LSM trees—the write-optimized alternative
  • Chapter 6 examines hash indexes and specialized structures

Chapter 4: Indexing Structures - B-Trees and Beyond

Chapter 5: LSM Trees and Write-Optimized Structures

Chapter 6: Hash Indexes and Specialized Structures


Part III: Transactions and Concurrency

Databases would be much simpler if only one user could access them at a time. But that’s not how the real world works. Modern databases handle thousands of concurrent connections, each trying to read and write data simultaneously. Without careful management, this concurrency leads to chaos—lost updates, phantom reads, and corrupted data.

        THE CONCURRENCY CHALLENGE

    Transaction A              Transaction B
         │                          │
         ▼                          ▼
    Read balance: $100         Read balance: $100
         │                          │
         ▼                          ▼
    Add $50                    Subtract $30
    Balance = $150             Balance = $70
         │                          │
         ▼                          ▼
    Write balance              Write balance
         │                          │
         ▼                          ▼
    ┌─────────────────────────────────────┐
    │   FINAL BALANCE: $70 or $150?       │
    │   (Should be $120!)                  │
    │   This is a LOST UPDATE             │
    └─────────────────────────────────────┘

Part III explores how databases maintain order in this chaos:

  • Chapter 7 explains write-ahead logging and durability guarantees
  • Chapter 8 covers MVCC and transaction isolation levels
  • Chapter 9 examines locking strategies and concurrency control

Chapter 7: Write-Ahead Logging (WAL)

Chapter 8: MVCC and Transaction Isolation

Chapter 9: Locking and Concurrency Control


Part IV: Query Processing

So far, we’ve focused on how databases store and retrieve data at the storage engine level. But there’s a crucial layer above this: the query processor. This is where SQL—a declarative language describing what data you want—gets transformed into an execution plan describing how to get it.

                    THE QUERY PROCESSING PIPELINE

    "SELECT name FROM users WHERE age > 21"
                      │
                      ▼
              ┌───────────────┐
              │    PARSER     │  → Syntax check, build AST
              └───────────────┘
                      │
                      ▼
              ┌───────────────┐
              │   ANALYZER    │  → Semantic check, resolve names
              └───────────────┘
                      │
                      ▼
              ┌───────────────┐
              │   OPTIMIZER   │  → Find best execution plan
              └───────────────┘
                      │
                      ▼
              ┌───────────────┐
              │   EXECUTOR    │  → Run the plan, return results
              └───────────────┘
                      │
                      ▼
              [ Results ]

Part IV explores this transformation:

  • Chapter 10 covers query parsing and planning fundamentals
  • Chapter 11 dives deep into query optimization
  • Chapter 12 examines buffer pools and caching strategies

Chapter 10: Query Parsing and Planning

Chapter 11: Query Optimization

Chapter 12: Buffer Pools and Caching


Part V: Reliability and Scale

Everything we’ve discussed so far assumes a single database server running without failures. Reality is messier: servers crash, disks fail, networks partition, and eventually one machine isn’t enough to handle all your data. The final part of this book addresses these challenges.

           FROM SINGLE NODE TO DISTRIBUTED

    ┌─────────────┐          ┌─────────────────────────────────┐
    │   Single    │          │       Distributed Cluster        │
    │   Server    │    →     │  ┌─────┐  ┌─────┐  ┌─────┐      │
    │  ┌───────┐  │          │  │Node1│  │Node2│  │Node3│      │
    │  │  DB   │  │          │  │ DB  │←→│ DB  │←→│ DB  │      │
    │  └───────┘  │          │  └─────┘  └─────┘  └─────┘      │
    └─────────────┘          └─────────────────────────────────┘

    Simple, but:              Complex, but:
    - Single point of         - High availability
      failure                 - Horizontal scaling
    - Limited scale           - Geographic distribution

Part V covers reliability and scaling:

  • Chapter 13 explains recovery mechanisms and crash safety
  • Chapter 14 compares column stores and row stores
  • Chapter 15 introduces distributed databases and replication

Chapter 13: Recovery and Crash Safety

Chapter 14: Column Stores vs Row Stores

Chapter 15: Distributed Databases and Replication


Appendices

Appendix A: Glossary of Terms

Appendix B: Further Reading


Closing Thoughts

Databases are among the most complex and important pieces of software ever created. They embody decades of research in data structures, distributed systems, and systems programming. Every time you save a document, make a purchase, or post a message, database engineers’ work ensures your data is stored safely and retrieved quickly.

My hope is that this book has lifted the veil on some of that complexity. You now understand why B-trees are shaped the way they are, how MVCC enables concurrent access without locking, why write-ahead logging is essential for durability, and how query optimizers choose execution plans. This knowledge will serve you well—whether you’re debugging a slow query, designing a new system, or simply appreciating the engineering marvels that power our digital world.

Databases will continue to evolve. New hardware like NVMe SSDs and persistent memory are changing the I/O assumptions that shaped classical designs. Distributed systems are becoming the norm rather than the exception. New workloads in machine learning and real-time analytics are driving innovation in storage formats and query processing. But the fundamentals—the need to organize data for efficient access, maintain consistency under concurrent access, and survive failures—remain constant.

Thank you for joining me on this journey into database internals. Now go forth and write better queries.


“The best performance improvement is the transition from the nonworking state to the working state.”

— John Ousterhout


Database Internals: Where Your Data Actually Lives

A CloudStreet Educational Book

Written by Opus 4.5

© 2024 CloudStreet Educational Series