Deep Technical Analysis: VACUUM FULL / CREATE INDEX Fails with "missing chunk number 0 for toast value"
Core Problem
This thread identifies a subtle but serious consistency bug in PostgreSQL's VACUUM mechanism related to horizon divergence between a heap table and its associated TOAST table during vacuum operations. The bug manifests as data corruption visible to users — specifically, a main table retains a tuple reference to TOAST data that has already been physically removed.
The Architectural Root Cause
The bug arises from an interaction between two different visibility mechanisms that use different scoping rules:
-
ComputeXidHorizons(used during VACUUM): When computingdata_oldest_nonremovable, this function:- Skips processes with
PROC_IN_VACUUMset - Filters by database — only considers transactions in the same database for the data horizon
- Feeds into
GlobalVisState->definitely_needed(monotonically increasing)
- Skips processes with
-
GetSnapshotData(used during CREATE INDEX): When computing snapshot xmin:- Also skips
PROC_IN_VACUUMprocesses - Does NOT filter by database — all transactions across all databases contribute to xmin
- Also skips
This asymmetry creates a window where:
The Race Condition Step-by-Step
-
VACUUM begins on
tbl: A transaction T1 in the same database holds the horizon back. The deleted tuple intblis classified asRECENTLY_DEAD— it survives. -
VACUUM proceeds to the TOAST table: By this time, T1 has committed. The horizon computation for the TOAST table runs fresh — with no same-database transactions holding it back,
OldestXminadvances past the TOAST tuple's xmax. The TOAST chunks are removed. -
CREATE INDEX on
tbl: The index build takes a new snapshot. A transaction T2 in a different database contributes its xmin to the snapshot (sinceGetSnapshotDatais cross-database). This xmin is less than the dead tuple's xmax, causingHeapTupleSatisfiesVacuumto classify the main table tuple asRECENTLY_DEAD(not DEAD). The index build attempts to fetch the TOAST value — but the chunks are gone.
Result: ERROR: missing chunk number 0 for toast value XXX
Why This Matters Architecturally
This is a data consistency violation — the fundamental contract that a TOAST reference from a visible (or potentially-visible) heap tuple always points to valid TOAST data is broken. This bug:
- Can cause index builds to fail
- Can cause VACUUM FULL to fail (which does heap rewrites)
- Potentially affects any operation that detoasts values from tuples that are
RECENTLY_DEAD - Undermines the invariant that TOAST vacuum should never remove data still reachable from the parent table
Proposed Solutions
Primary Proposal: Cache Parent's OldestXmin for TOAST Vacuum
The patch adds two fields to VacuumParams:
cached_parent_oldest_xmin: TheOldestXmincomputed during the parent table's vacuumcached_parent_cutoffs_valid: Boolean flag indicating validity
In heap_vacuum_rel(), when vacuuming a TOAST table, instead of calling GetOldestNonRemovableTransactionId() fresh, the code uses the parent's cached value. This ensures the TOAST vacuum never removes tuples that the parent vacuum considered potentially visible.
Strengths:
- Directly addresses the root cause
- Maintains the invariant: TOAST cleanup is bounded by parent's horizon
- Localized change within the vacuum subsystem
Known Issues:
- Freezing cutoff problems: The parent's OldestXmin may be "far in the past" relative to what the TOAST table would normally compute, causing assertion failures in freezing logic
make checkfailures indicate the freezing path needs special handling when using a cached/older horizon
Alternative: Add definitely_needed Check in CREATE INDEX
Instead of fixing TOAST vacuum, this approach would add an additional visibility check during index builds:
- A tuple classified as
RECENTLY_DEADby OldestXmin would be reclassified as DEAD ifdefinitely_needed(fromGlobalVisState) confirms it's safe to remove
Tradeoffs:
- Adds per-tuple overhead during index builds (performance regression)
- Doesn't fix the root cause — just papers over it at the consumer side
- Would need to be added to every code path that detoasts from potentially-dead tuples (VACUUM FULL, CLUSTER, etc.)
Key Technical Observations
The Database-Scoping Asymmetry is By Design (But Has Consequences)
The per-database scoping in ComputeXidHorizons is an optimization — it allows VACUUM to be more aggressive within a database by ignoring transactions that cannot possibly hold snapshots referencing the current database's data. However, TOAST tables live in the same database as their parent, so the TOAST vacuum seeing a different (advanced) horizon is a logical error in the sequencing, not in the horizon computation itself.
The Timing Window
The critical window is between:
- Parent table vacuum computing its horizon (T1 still active → tuple survives)
- TOAST table vacuum computing its horizon (T1 committed → TOAST chunks removed)
This is a serial execution within a single VACUUM command — the parent and TOAST vacuums happen sequentially (line ~2300 in src/backend/commands/vacuum.c), making the race quite practical to hit in production with long-running transactions that commit during vacuum.
Relation to Bug #18351
This is a known but unresolved issue. The original report likely lacked a clear reproduction case, making it difficult to diagnose. This submission provides both the reproduction and root-cause analysis.
Assessment of the Proposed Fix
The "cache parent OldestXmin" approach is architecturally sound but implementation-incomplete. The key challenge is that PostgreSQL's vacuum/freeze machinery assumes the cutoffs are computed coherently — using a "stale" OldestXmin from the parent while other cutoffs (freeze limits, MultiXact cutoffs) are computed fresh for the TOAST table creates inconsistencies.
A more complete fix would likely need to:
- Cache all relevant cutoffs from the parent vacuum (not just OldestXmin)
- Or take the minimum of parent's and fresh TOAST cutoffs for each parameter
- Handle the case where the parent's OldestXmin is older than what the TOAST table's
relfrozenxidwould normally allow
The alternative approach (fixing consumers like CREATE INDEX) is less appealing because it treats symptoms rather than the cause, and the number of affected code paths is unbounded.