=?UTF-8?B?VkFDVVVNIEZVTEwgb3IgQ1JFQVRFIElOREVYIGZhaWxzIHdpdGggZXJyb3I6?= =?UTF-8?B?IG1pc3NpbmcgY2h1bmsgbnVtYmVyIDAgZm9yIHRvYXN0IHZhbHVlIFhYWA==?=

First seen: 2026-05-18 04:20:55+00:00 · Messages: 1 · Participants: 1

Latest Update

2026-05-20 · claude-opus-4-6

Deep Technical Analysis: VACUUM FULL / CREATE INDEX Fails with "missing chunk number 0 for toast value"

Core Problem

This thread identifies a subtle but serious consistency bug in PostgreSQL's VACUUM mechanism related to horizon divergence between a heap table and its associated TOAST table during vacuum operations. The bug manifests as data corruption visible to users — specifically, a main table retains a tuple reference to TOAST data that has already been physically removed.

The Architectural Root Cause

The bug arises from an interaction between two different visibility mechanisms that use different scoping rules:

  1. ComputeXidHorizons (used during VACUUM): When computing data_oldest_nonremovable, this function:

    • Skips processes with PROC_IN_VACUUM set
    • Filters by database — only considers transactions in the same database for the data horizon
    • Feeds into GlobalVisState->definitely_needed (monotonically increasing)
  2. GetSnapshotData (used during CREATE INDEX): When computing snapshot xmin:

    • Also skips PROC_IN_VACUUM processes
    • Does NOT filter by database — all transactions across all databases contribute to xmin

This asymmetry creates a window where:

The Race Condition Step-by-Step

  1. VACUUM begins on tbl: A transaction T1 in the same database holds the horizon back. The deleted tuple in tbl is classified as RECENTLY_DEAD — it survives.

  2. VACUUM proceeds to the TOAST table: By this time, T1 has committed. The horizon computation for the TOAST table runs fresh — with no same-database transactions holding it back, OldestXmin advances past the TOAST tuple's xmax. The TOAST chunks are removed.

  3. CREATE INDEX on tbl: The index build takes a new snapshot. A transaction T2 in a different database contributes its xmin to the snapshot (since GetSnapshotData is cross-database). This xmin is less than the dead tuple's xmax, causing HeapTupleSatisfiesVacuum to classify the main table tuple as RECENTLY_DEAD (not DEAD). The index build attempts to fetch the TOAST value — but the chunks are gone.

Result: ERROR: missing chunk number 0 for toast value XXX

Why This Matters Architecturally

This is a data consistency violation — the fundamental contract that a TOAST reference from a visible (or potentially-visible) heap tuple always points to valid TOAST data is broken. This bug:

Proposed Solutions

Primary Proposal: Cache Parent's OldestXmin for TOAST Vacuum

The patch adds two fields to VacuumParams:

In heap_vacuum_rel(), when vacuuming a TOAST table, instead of calling GetOldestNonRemovableTransactionId() fresh, the code uses the parent's cached value. This ensures the TOAST vacuum never removes tuples that the parent vacuum considered potentially visible.

Strengths:

Known Issues:

Alternative: Add definitely_needed Check in CREATE INDEX

Instead of fixing TOAST vacuum, this approach would add an additional visibility check during index builds:

Tradeoffs:

Key Technical Observations

The Database-Scoping Asymmetry is By Design (But Has Consequences)

The per-database scoping in ComputeXidHorizons is an optimization — it allows VACUUM to be more aggressive within a database by ignoring transactions that cannot possibly hold snapshots referencing the current database's data. However, TOAST tables live in the same database as their parent, so the TOAST vacuum seeing a different (advanced) horizon is a logical error in the sequencing, not in the horizon computation itself.

The Timing Window

The critical window is between:

  1. Parent table vacuum computing its horizon (T1 still active → tuple survives)
  2. TOAST table vacuum computing its horizon (T1 committed → TOAST chunks removed)

This is a serial execution within a single VACUUM command — the parent and TOAST vacuums happen sequentially (line ~2300 in src/backend/commands/vacuum.c), making the race quite practical to hit in production with long-running transactions that commit during vacuum.

Relation to Bug #18351

This is a known but unresolved issue. The original report likely lacked a clear reproduction case, making it difficult to diagnose. This submission provides both the reproduction and root-cause analysis.

Assessment of the Proposed Fix

The "cache parent OldestXmin" approach is architecturally sound but implementation-incomplete. The key challenge is that PostgreSQL's vacuum/freeze machinery assumes the cutoffs are computed coherently — using a "stale" OldestXmin from the parent while other cutoffs (freeze limits, MultiXact cutoffs) are computed fresh for the TOAST table creates inconsistencies.

A more complete fix would likely need to:

  1. Cache all relevant cutoffs from the parent vacuum (not just OldestXmin)
  2. Or take the minimum of parent's and fresh TOAST cutoffs for each parameter
  3. Handle the case where the parent's OldestXmin is older than what the TOAST table's relfrozenxid would normally allow

The alternative approach (fixing consumers like CREATE INDEX) is less appealing because it treats symptoms rather than the cause, and the number of affected code paths is unbounded.