Appendix B: Further Reading
This appendix provides recommendations for deepening your understanding of database internals.
Books
Database Fundamentals
“Database Internals” by Alex Petrov (O’Reilly, 2019) A comprehensive deep-dive into storage engines, distributed systems, and the algorithms powering modern databases. Covers both B-trees and LSM trees in depth.
“Designing Data-Intensive Applications” by Martin Kleppmann (O’Reilly, 2017) Essential reading for anyone building data systems. Covers storage, replication, partitioning, transactions, and distributed systems with exceptional clarity.
“Database System Concepts” by Silberschatz, Korth, and Sudarshan The standard academic textbook covering database theory, SQL, storage, indexing, transactions, and distributed databases.
Specific Topics
“Transaction Processing: Concepts and Techniques” by Jim Gray and Andreas Reuter (1993) The definitive reference on transaction processing, written by pioneers in the field. Covers ACID, recovery, and distributed transactions in depth.
“The Art of PostgreSQL” by Dimitri Fontaine A practical guide to PostgreSQL, covering SQL techniques, performance tuning, and real-world applications.
“High Performance MySQL” by Silvia Botros and Jeremy Tinley (O’Reilly, 4th Edition) Comprehensive coverage of MySQL architecture, optimization, replication, and operations.
Papers
Foundational Papers
“A Relational Model of Data for Large Shared Data Banks” — E.F. Codd (1970) The paper that started it all. Introduced the relational model that underlies SQL databases.
“The Design and Implementation of INGRES” — Stonebraker et al. (1976) Describes one of the first relational database implementations, influencing PostgreSQL’s design.
“ARIES: A Transaction Recovery Method Supporting Fine-Granularity Locking” — Mohan et al. (1992) The definitive paper on write-ahead logging and recovery. Still the basis for most database recovery systems.
Index Structures
“Organization and Maintenance of Large Ordered Indexes” — Bayer and McCreight (1972) The original B-tree paper.
“The Ubiquitous B-Tree” — Douglas Comer (1979) An accessible survey of B-tree variants and applications.
“The Log-Structured Merge-Tree (LSM-Tree)” — O’Neil et al. (1996) Introduced LSM trees, now used by LevelDB, RocksDB, Cassandra, and many others.
Concurrency and Transactions
“Granularity of Locks and Degrees of Consistency in a Shared Data Base” — Gray et al. (1976) Introduced isolation levels and hierarchical locking.
“A Critique of ANSI SQL Isolation Levels” — Berenson et al. (1995) Analyzes ANSI isolation levels and introduces snapshot isolation.
“Serializable Snapshot Isolation in PostgreSQL” — Ports and Grittner (2012) Describes PostgreSQL’s implementation of serializable isolation using SSI.
Distributed Systems
“Dynamo: Amazon’s Highly Available Key-value Store” — DeCandia et al. (2007) Influenced many NoSQL databases including Cassandra and Riak.
“Bigtable: A Distributed Storage System for Structured Data” — Chang et al. (2006) Google’s wide-column store that influenced HBase and Cassandra.
“Spanner: Google’s Globally-Distributed Database” — Corbett et al. (2012) Describes TrueTime and globally consistent distributed transactions.
“Calvin: Fast Distributed Transactions for Partitioned Database Systems” — Thomson et al. (2012) An alternative approach to distributed transactions using deterministic ordering.
“In Search of an Understandable Consensus Algorithm (Raft)” — Ongaro and Ousterhout (2014) A more accessible alternative to Paxos, widely implemented in modern systems.
Query Processing
“Access Path Selection in a Relational Database Management System” — Selinger et al. (1979) The classic paper on query optimization from System R.
“Volcano—An Extensible and Parallel Query Evaluation System” — Graefe (1994) Describes the iterator model used by most database executors.
“MonetDB/X100: Hyper-Pipelining Query Execution” — Boncz et al. (2005) Introduced vectorized query execution.
Online Resources
Documentation
PostgreSQL Documentation https://www.postgresql.org/docs/ Exceptionally thorough and readable. The internals sections are particularly valuable.
MySQL Reference Manual https://dev.mysql.com/doc/refman/ Comprehensive documentation including InnoDB internals.
SQLite Documentation https://www.sqlite.org/docs.html Includes detailed explanations of B-tree implementation and file format.
Blogs and Articles
The Morning Paper https://blog.acolyer.org/ Adrian Colyer’s summaries of computer science papers, including many database papers.
Use The Index, Luke https://use-the-index-luke.com/ An excellent resource on SQL indexing and performance.
Brandur’s Blog https://brandur.org/ Deep dives into PostgreSQL internals and database design.
CockroachDB Blog https://www.cockroachlabs.com/blog/ Technical articles on distributed databases and Raft implementation.
PlanetScale Blog https://planetscale.com/blog Articles on MySQL, Vitess, and distributed databases.
Video Resources
CMU Database Group (YouTube) Andy Pavlo’s database courses are freely available and excellent.
PGCon Talks Annual PostgreSQL conference talks cover internals in depth.
Source Code
Reading source code is one of the best ways to understand database internals.
Recommended Starting Points
SQLite https://sqlite.org/src/doc/trunk/README.md Small, readable codebase. Excellent for understanding B-trees and the query pipeline.
LevelDB https://github.com/google/leveldb Clean implementation of LSM trees. Well-documented code.
PostgreSQL https://github.com/postgres/postgres Large but well-organized. Start with the documentation on source code structure.
DuckDB https://github.com/duckdb/duckdb Modern analytical database with clean, modern C++ codebase.
Tools for Exploration
PostgreSQL
-- See execution plans
EXPLAIN (ANALYZE, BUFFERS, FORMAT TEXT) SELECT ...;
-- Examine buffer cache contents
CREATE EXTENSION pg_buffercache;
SELECT * FROM pg_buffercache;
-- View index structure
CREATE EXTENSION pageinspect;
SELECT * FROM bt_metap('index_name');
SELECT * FROM bt_page_items('index_name', 1);
-- Statistics
SELECT * FROM pg_stats WHERE tablename = 'your_table';
MySQL
-- Execution plans
EXPLAIN FORMAT=TREE SELECT ...;
-- InnoDB status
SHOW ENGINE INNODB STATUS\G
-- Index statistics
SELECT * FROM mysql.innodb_index_stats WHERE table_name = 'your_table';
System Tools
# I/O monitoring
iostat -x 1
# Memory monitoring
vmstat 1
# Trace system calls
strace -f -e read,write,fsync postgres
# Examine file blocks
xxd -l 8192 /path/to/tablefile
Conferences
SIGMOD (ACM Special Interest Group on Management of Data) Premier academic database conference.
VLDB (Very Large Data Bases) Major academic conference on database systems.
PGCon Annual PostgreSQL conference with deep technical content.
Percona Live MySQL and open-source database conference.
CMU Database Conference Industry/academic conference organized by CMU’s database group.
Practice Projects
The best way to learn is to build:
- Build a simple key-value store with persistence
- Implement a B-tree with insertion, deletion, and search
- Create a simple query executor with scan and filter operators
- Build a write-ahead log with crash recovery
- Implement MVCC for a simple database
Communities
PostgreSQL Mailing Lists https://www.postgresql.org/list/
MySQL Forums https://forums.mysql.com/
Database Internals Discord Community for discussing database internals
r/PostgreSQL and r/Database on Reddit Active communities for database discussions
“The more you learn about how databases work, the more you appreciate both their complexity and their elegance. Keep exploring!”