2026-06-01 · claude-opus-4-6
REPACK [CONCURRENTLY] — May 2026 Monthly Summary
Overview
May 2026 was dominated by the decision to revert db-specific snapshots from PG19, while the core REPACK CONCURRENTLY feature remains committed. The month saw rigorous technical review that exposed architectural flaws in the db-specific snapshot optimization, culminating in Alvaro formally accepting the revert on May 19 and pushing it on May 24. A separate WAL accumulation bug was also identified and fixed.
Key Developments
db-specific Snapshots: From Review to Revert
The db-specific snapshot optimization (commit 0d3dba38c777) — designed to allow multiple concurrent REPACK operations across different databases without blocking each other's snapshot construction — underwent intense scrutiny throughout the month:
-
Amit Kapila (May 8) raised four architectural concerns: unconditional WAL amplification from LogStandbySnapshot(MyDatabaseId) post-CONSISTENT, duplicate lock processing on standbys, stale xmin in ReorderBuffer cleanup, and fundamental incompatibility with failover slots/slotsync.
-
Masahiko Sawada (May 8) independently found cross-database snapshot poisoning (REPACK in DB-B could restore a snapshot from DB-A) and demonstrated that need_shared_catalogs=false slots cannot work on standbys.
-
Antonin Houska produced a deterministic reproducer via injection points, confirming the root cause: RecordTransactionCommit() writes COMMIT WAL before ProcArrayEndTransaction() removes the XID from procarray, allowing LogStandbySnapshot() to capture already-committed XIDs as "still running."
-
Hayato Kuroda independently confirmed the same mechanism via gdb.
-
Amit's final position (pre-beta-1): the entire approach is architecturally unsound — logical decoding is designed for cluster-wide transactions, and the patch only partially bypasses that assumption during snapshot construction. He endorsed reverting for PG19.
-
Alvaro conceded (May 19): insufficient time before beta-1 to address architectural concerns. Committed the revert on May 24 (CI build 5520722497372160).
PG19 Ships With Known Limitation
With the revert, PG19's REPACK CONCURRENTLY uses standard cluster-wide snapshot building. Only one REPACK CONCURRENTLY can build its snapshot at a time per cluster — subsequent ones block. This primarily affects multi-tenant systems. The optimization is deferred to PG20 for holistic redesign.
WAL Accumulation Bug Fix
Hou Zhijie identified that REPACK CONCURRENTLY causes unnecessary WAL file accumulation during operation — the logical replication slot's restart_lsn is not advancing properly. Alvaro confirmed this is a "thinko" and indicated the fix will be pushed for PG19. Given that REPACK can run for hours on large tables, this is a meaningful operational fix to prevent disk exhaustion.
Architectural Context (Established Prior to May)
REPACK CONCURRENTLY's architecture (derived from pg_squeeze):
- Acquires
ShareUpdateExclusiveLock on the target table
- Background worker runs logical decoding, writing changes to a
SharedFileSet
- Main backend copies old heap to new heap, builds indexes
- Replays decoded changes onto new heap
- Upgrades to
AccessExclusiveLock and swaps relfilenodes
Known deferred issues for PG20: MVCC safety (tuples carry REPACK's xmin), xmin horizon pinning, lock-upgrade deadlock handling improvements.
2026-06-01 · claude-opus-4-6
WAL Accumulation Fix: Review, Refinement, and Commit
The WAL accumulation bug fix for REPACK CONCURRENTLY progressed through review and was committed.
Amit Kapila's Review (2026-05-28)
Amit provided two substantive review points on Hou's patch:
1. Comment should explain why advancing the slot is safe: Amit notes that manually advancing the replication slot on WAL segment boundaries is only safe because REPACK creates a temporary slot that is dropped if REPACK fails — there's no scenario where the slot needs to restart decoding from an earlier position while still alive. He requests this reasoning be documented in the code comment.
2. ReplicationSlotsComputeRequiredLSN() change is pre-existing bug: The patch moves ReplicationSlotsComputeRequiredLSN() outside the if block so it's called only when updated_restart is true. Amit points out this is a pre-existing issue (not caused by the REPACK patch) and suggests noting it separately in the commit message for clarity.
3. catalog_xmin advancement question: Amit raises a concern about the commit message's explicit deferral of catalog_xmin advancement — asking whether long-running REPACKs will accumulate dead catalog tuples needlessly. He asks if there are ideas to avoid this.
Hou's Response on catalog_xmin
Hou argues this is not a new problem — all long-running commands (CLUSTER, VACUUM FULL, non-concurrent REPACK) that hold snapshots cause the same catalog dead tuple accumulation. Since catalog_xmin only affects system catalog tuples (less harmful than user table bloat), handling it independently is reasonable. He references Antonin's earlier proposal [1] for snapshot resetting during the copy phase as the path forward. Hou submitted v4 with improved comments per Amit's suggestions.
Amit Accepts, Alvaro Commits
Amit agrees the catalog_xmin concern can be handled separately and confirms the code changes look good (without testing). Alvaro pushed the fix on 2026-05-29, split into two parts, with a rewritten comment in decode_concurrent_changes().
Hou's Test Patch
Hou also submitted a TAP test (0002) verifying WAL file removal during REPACK CONCURRENTLY, reporting acceptable complexity and speed (< 1s). It's unclear from the messages whether this test was included in what Alvaro pushed (he says "Pushed 0001, in two parts").