Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

Chapter 8: MVCC and Transaction Isolation

“MVCC: Because readers should never wait for writers.”

When multiple transactions access the database simultaneously, chaos ensues—unless the database carefully manages what each transaction can see. MVCC (Multi-Version Concurrency Control) and transaction isolation levels are the mechanisms that maintain order.


8.1 The Concurrency Problem

Consider two transactions running simultaneously:

-- Transaction A                    -- Transaction B
BEGIN;                              BEGIN;
SELECT balance FROM accounts
  WHERE id = 1;
-- Returns: $1000
                                    UPDATE accounts
                                      SET balance = balance - 100
                                      WHERE id = 1;
                                    COMMIT;

SELECT balance FROM accounts
  WHERE id = 1;
-- Returns: $1000 or $900?

Should Transaction A see B’s change? It depends on what guarantees we want.


8.2 ACID: The Transaction Guarantees

Transactions provide four guarantees, known as ACID:

Atomicity

All or nothing. Either all changes commit, or none do.

    Transfer $100:
    1. Deduct from Account A
    2. Add to Account B

    If crash between steps:
    - Without atomicity: A lost $100, B gained nothing
    - With atomicity: Transaction rolls back, both unchanged

Consistency

Database moves from one valid state to another. Constraints are maintained.

    Constraint: balance >= 0

    Account has $50, tries to deduct $100:
    - Without consistency: Balance = -$50 (invalid)
    - With consistency: Transaction rejected

Isolation

Concurrent transactions don’t interfere. Each sees a consistent view.

    Transaction A reads balance
    Transaction B modifies balance
    Transaction A reads balance again

    - Without isolation: A sees different values
    - With isolation: A sees consistent view

Durability

Once committed, changes survive crashes.

    COMMIT succeeds
    Server crashes
    Server restarts

    - Without durability: Data might be lost
    - With durability: Committed data is there

8.3 MVCC: The Key Insight

Multi-Version Concurrency Control maintains multiple versions of each row:

                    MVCC: MULTIPLE VERSIONS

    Logical row: accounts WHERE id = 1

    Physical storage:
    ┌─────────────────────────────────────────────────────────────┐
    │ Version 1: balance=$1000, created=TxID 100, deleted=TxID 150│
    │ Version 2: balance=$900,  created=TxID 150, deleted=∞       │
    └─────────────────────────────────────────────────────────────┘

    Transaction 140 (started before 150):
    - Sees Version 1 (created 100 ≤ 140, deleted 150 > 140)

    Transaction 160 (started after 150):
    - Sees Version 2 (created 150 ≤ 160, deleted ∞)

Key Benefits

  1. Readers don’t block writers: Readers see old version while writer modifies
  2. Writers don’t block readers: Writer creates new version, readers see old
  3. Consistent snapshots: Each transaction sees data as of its start time

8.4 How MVCC Works: PostgreSQL

Row Structure

Each row has hidden system columns:

    Physical Row (tuple):
    ┌─────────────────────────────────────────────────────────────┐
    │ xmin:  Transaction that created this version (100)          │
    │ xmax:  Transaction that deleted/updated this version (150)  │
    │ ctid:  Current tuple ID (physical location)                 │
    │ data:  balance=$1000, name='Alice', ...                     │
    └─────────────────────────────────────────────────────────────┘

Visibility Rules

A row version is visible to transaction T if:

  1. xmin is committed AND xmin started before T
  2. xmax is empty OR xmax is not committed OR xmax started after T
    Transaction 140 checks visibility of Version 1:
    - xmin = 100 (committed, before 140) ✓
    - xmax = 150 (not yet committed OR after 140) ✓
    - VISIBLE

    Transaction 160 checks visibility of Version 1:
    - xmin = 100 (committed, before 160) ✓
    - xmax = 150 (committed, before 160) ✗
    - NOT VISIBLE (deleted)

UPDATE Creates New Version

UPDATE accounts SET balance = 900 WHERE id = 1;
    Before (TxID 150):
    ┌────────────────────────────────────────────────────────────┐
    │ xmin=100, xmax=∞, balance=$1000                            │
    └────────────────────────────────────────────────────────────┘

    After:
    ┌────────────────────────────────────────────────────────────┐
    │ xmin=100, xmax=150, balance=$1000  ← Old version (deleted) │
    └────────────────────────────────────────────────────────────┘
    ┌────────────────────────────────────────────────────────────┐
    │ xmin=150, xmax=∞, balance=$900     ← New version           │
    └────────────────────────────────────────────────────────────┘

The old row isn’t physically deleted—it’s marked as deleted by setting xmax.

VACUUM: Cleaning Up Old Versions

Old versions accumulate. VACUUM removes them when no transaction can see them:

    Before VACUUM:
    [xmin=100, xmax=150, balance=$1000]  ← Dead, no one can see
    [xmin=150, xmax=∞, balance=$900]     ← Current

    After VACUUM:
    [xmin=150, xmax=∞, balance=$900]     ← Only current remains

VACUUM is crucial for PostgreSQL performance. Without it, tables bloat with dead rows.


8.5 How MVCC Works: MySQL InnoDB

InnoDB uses a different MVCC implementation:

Undo Logs

Instead of keeping old versions in the table, InnoDB stores them in undo logs:

                    INNODB MVCC

    Clustered Index (current data):
    ┌─────────────────────────────────────────────────────────────┐
    │ id=1, balance=$900, TxID=150, roll_ptr=→                   │
    └────────────────────────────────────┬────────────────────────┘
                                         │
                                         ▼
    Undo Log:
    ┌─────────────────────────────────────────────────────────────┐
    │ Previous version: balance=$1000, TxID=100, roll_ptr=NULL    │
    └─────────────────────────────────────────────────────────────┘

Read View

When a transaction starts, it gets a read view:

    Read View for Transaction 140:
    - m_low_limit_id: 160 (TxIDs ≥ this are invisible)
    - m_up_limit_id: 100 (TxIDs < this are visible)
    - m_ids: [150, 155] (active transactions at start, invisible)

To read a row, follow undo chain until finding a visible version.

Purge Thread

InnoDB’s purge thread removes old undo records when no transaction needs them.


8.6 Transaction Isolation Levels

SQL defines four isolation levels, offering different trade-offs:

                    ISOLATION LEVEL SPECTRUM

    Less Isolation                              More Isolation
    Faster                                      More Correct
    ◄───────────────────────────────────────────────────────────►

    Read         Read          Repeatable      Serializable
    Uncommitted  Committed     Read

Read Uncommitted

See uncommitted changes from other transactions. (Rarely used)

SET TRANSACTION ISOLATION LEVEL READ UNCOMMITTED;

-- Transaction A                    -- Transaction B
BEGIN;                              BEGIN;
                                    UPDATE accounts SET balance = 0
                                      WHERE id = 1;
                                    -- NOT committed yet!

SELECT balance FROM accounts
  WHERE id = 1;
-- Returns: $0 (dirty read!)
                                    ROLLBACK;

-- A saw data that never existed!

Anomaly allowed: Dirty reads

Read Committed (PostgreSQL/Oracle default)

Only see committed changes. Each statement sees latest committed data.

SET TRANSACTION ISOLATION LEVEL READ COMMITTED;

-- Transaction A                    -- Transaction B
BEGIN;                              BEGIN;
SELECT balance FROM accounts
  WHERE id = 1;
-- Returns: $1000
                                    UPDATE accounts SET balance = 900
                                      WHERE id = 1;
                                    COMMIT;

SELECT balance FROM accounts
  WHERE id = 1;
-- Returns: $900 (different!)

-- Same query, different result = Non-repeatable read

Anomaly allowed: Non-repeatable reads, phantoms

Repeatable Read (MySQL default)

See a snapshot as of transaction start. Same query returns same result.

SET TRANSACTION ISOLATION LEVEL REPEATABLE READ;

-- Transaction A                    -- Transaction B
BEGIN;                              BEGIN;
SELECT balance FROM accounts
  WHERE id = 1;
-- Returns: $1000
                                    UPDATE accounts SET balance = 900
                                      WHERE id = 1;
                                    COMMIT;

SELECT balance FROM accounts
  WHERE id = 1;
-- Returns: $1000 (same as before!)

COMMIT;

Anomaly allowed: Phantom reads (in standard SQL; PostgreSQL prevents them)

Serializable

Transactions execute as if in serial order. No anomalies.

SET TRANSACTION ISOLATION LEVEL SERIALIZABLE;

-- Fully isolated: equivalent to running one after another

Most restrictive but guarantees correctness.


8.7 Isolation Anomalies Explained

Dirty Read

Reading uncommitted data that may be rolled back.

    T1: UPDATE row SET x = 2
    T2: SELECT x  -- Returns 2
    T1: ROLLBACK
    -- T2 read value that never existed!

Non-Repeatable Read

Same query returns different results within one transaction.

    T1: SELECT x  -- Returns 1
    T2: UPDATE row SET x = 2
    T2: COMMIT
    T1: SELECT x  -- Returns 2 (different!)

Phantom Read

New rows appear in repeated queries.

    T1: SELECT * FROM accounts WHERE balance > 100
        -- Returns: Alice, Bob

    T2: INSERT INTO accounts (name, balance) VALUES ('Carol', 150)
    T2: COMMIT

    T1: SELECT * FROM accounts WHERE balance > 100
        -- Returns: Alice, Bob, Carol (phantom!)

Write Skew

Two transactions read overlapping data, make decisions, write non-overlapping data, creating invalid state.

    Constraint: At least one doctor must be on call

    T1: SELECT count(*) FROM on_call  -- Returns 2
    T2: SELECT count(*) FROM on_call  -- Returns 2

    T1: DELETE FROM on_call WHERE doctor = 'Alice'
    T2: DELETE FROM on_call WHERE doctor = 'Bob'

    T1: COMMIT (1 doctor remains, OK)
    T2: COMMIT (0 doctors remain, VIOLATES CONSTRAINT!)

Neither transaction saw the other’s delete.


8.8 Isolation Levels and Anomalies Matrix

Isolation LevelDirty ReadNon-RepeatablePhantomWrite Skew
Read UncommittedYesYesYesYes
Read CommittedNoYesYesYes
Repeatable ReadNoNoPossible*Yes
SerializableNoNoNoNo

*PostgreSQL prevents phantoms at Repeatable Read; MySQL allows them


8.9 Snapshot Isolation

Many databases implement Snapshot Isolation (SI), which is between Repeatable Read and Serializable:

    Snapshot Isolation Rules:
    1. Transaction sees consistent snapshot from start time
    2. Write conflicts detected at commit time
    3. First committer wins

    T1: BEGIN (snapshot at time 100)
    T2: BEGIN (snapshot at time 100)

    T1: UPDATE accounts SET balance = 900 WHERE id = 1
    T2: UPDATE accounts SET balance = 800 WHERE id = 1

    T1: COMMIT  -- Succeeds
    T2: COMMIT  -- FAILS! Write conflict on id=1

PostgreSQL: Repeatable Read is actually Snapshot Isolation.

Difference from true Serializable: SI allows write skew anomalies.


8.10 Serializable Snapshot Isolation (SSI)

PostgreSQL’s Serializable level uses SSI (Serializable Snapshot Isolation):

                    SSI: DETECTING CONFLICTS

    Track read/write dependencies:

    T1: Read A
    T2: Write A  ← conflict: T2 wrote what T1 read
    T1: Write B  ← conflict: T1 wrote after reading T2's input

    Dependency graph:
    T1 ──rw──► T2 ──wr──► T1  (cycle detected!)

    Result: Abort one transaction to break cycle

SSI adds overhead for tracking dependencies but provides true serializability without the locking overhead of traditional approaches.


8.11 Practical Isolation Level Selection

When to Use Read Committed

  • Default for most OLTP applications
  • Acceptable when slight inconsistency is OK
  • When you want maximum concurrency
-- OK: User profile view (momentary inconsistency acceptable)
SELECT name, email, last_login FROM users WHERE id = ?;

When to Use Repeatable Read / Snapshot Isolation

  • Reports that need consistent view of data
  • Transactions reading the same data multiple times
  • When non-repeatable reads would cause bugs
-- Generate consistent report
BEGIN ISOLATION LEVEL REPEATABLE READ;
SELECT SUM(balance) FROM accounts;
SELECT COUNT(*) FROM accounts WHERE balance < 0;
COMMIT;
-- Both queries see same snapshot

When to Use Serializable

  • Financial transactions where write skew is dangerous
  • When correctness is critical
  • Simpler application logic (fewer edge cases to handle)
-- On-call schedule modification
BEGIN ISOLATION LEVEL SERIALIZABLE;
SELECT count(*) FROM on_call;
DELETE FROM on_call WHERE doctor = ?;
-- Database prevents invalid state
COMMIT;

8.12 MVCC Overhead and Maintenance

Table Bloat

MVCC creates dead rows that consume space:

-- PostgreSQL: Check table bloat
SELECT
    schemaname || '.' || relname as table,
    n_live_tup,
    n_dead_tup,
    round(n_dead_tup * 100.0 / nullif(n_live_tup + n_dead_tup, 0), 2) as dead_pct
FROM pg_stat_user_tables
ORDER BY n_dead_tup DESC;

VACUUM Strategies

-- Manual VACUUM
VACUUM VERBOSE accounts;

-- Aggressive space reclamation
VACUUM FULL accounts;  -- Warning: locks table!

-- Autovacuum configuration
ALTER TABLE accounts SET (
    autovacuum_vacuum_threshold = 100,
    autovacuum_vacuum_scale_factor = 0.1
);

Transaction ID Wraparound

PostgreSQL transaction IDs are 32-bit and wrap around. Long-running transactions or failed vacuums can cause issues:

-- Check transaction ID age
SELECT
    datname,
    age(datfrozenxid) as xid_age,
    2^31 - age(datfrozenxid) as remaining
FROM pg_database
ORDER BY age(datfrozenxid) DESC;

-- If approaching 2^31, VACUUM urgently!

8.13 Read-Only Transactions

For read-only workloads, declare transactions as read-only:

BEGIN READ ONLY;
SELECT * FROM large_table;
COMMIT;

Benefits:

  • Database can optimize (skip locking, use replicas)
  • Prevents accidental writes
  • May use older snapshot if acceptable
-- PostgreSQL: Use old snapshot if within limits
SET TRANSACTION ISOLATION LEVEL REPEATABLE READ READ ONLY;
SET TRANSACTION SNAPSHOT '00000003-0000001A-1';

8.14 Summary

MVCC enables concurrent access without readers blocking writers:

  • Multiple versions of each row coexist
  • Visibility rules determine which version each transaction sees
  • Isolation levels trade correctness for concurrency
  • Snapshot Isolation provides repeatable reads efficiently
  • SSI enables true serializability without traditional locking
  • VACUUM cleans up old versions to prevent bloat

Understanding MVCC helps you choose appropriate isolation levels and diagnose concurrency issues.


What’s Next

In Chapter 9, we’ll explore locking and other concurrency control mechanisms—the locks, latches, and protocols that databases use when MVCC alone isn’t enough.


“In MVCC, every writer is a creator, and every reader sees their own private reality.”