Adding REPACK [concurrently]

First seen: 2025-07-26 21:56:04+00:00 · Messages: 375 · Participants: 30

Latest Update

2026-06-04 · claude-opus-4-6

Incremental Update: June 1–3, 2026

Minimal Substantive Progress

Only two new messages appeared since the last analysis, neither introducing significant new technical content:

1. Hou Zhijie (June 1): Acknowledges Alvaro's commit of the WAL accumulation fix, confirms the rewritten comments look good, and submits a corrected version of the 0002 TAP test patch (fixing a missed DROP SLOT from previous tests that was preventing WAL removal in the test environment). This is a minor test bug fix, not a new technical development.

2. Mikhail Nikalayeu (June 3): Resurfaces the lock-upgrade deadlock issue — the known problem where REPACK CONCURRENTLY's upgrade from ShareUpdateExclusiveLock to AccessExclusiveLock can deadlock with concurrent DDL, causing loss of potentially hours of repack work. He links to his previously-submitted solution on a separate thread. He also mentions plans to continue work on a REPACK stress test suite. No new technical detail is provided in this message itself — it's a reminder/ping rather than new analysis.

Status of Known Issues

History (2 prior analyses)
2026-06-01 · claude-opus-4-6

REPACK [CONCURRENTLY] — May 2026 Monthly Summary

Overview

May 2026 was dominated by the decision to revert db-specific snapshots from PG19, while the core REPACK CONCURRENTLY feature remains committed. The month saw rigorous technical review that exposed architectural flaws in the db-specific snapshot optimization, culminating in Alvaro formally accepting the revert on May 19 and pushing it on May 24. A separate WAL accumulation bug was also identified and fixed.

Key Developments

db-specific Snapshots: From Review to Revert

The db-specific snapshot optimization (commit 0d3dba38c777) — designed to allow multiple concurrent REPACK operations across different databases without blocking each other's snapshot construction — underwent intense scrutiny throughout the month:

  1. Amit Kapila (May 8) raised four architectural concerns: unconditional WAL amplification from LogStandbySnapshot(MyDatabaseId) post-CONSISTENT, duplicate lock processing on standbys, stale xmin in ReorderBuffer cleanup, and fundamental incompatibility with failover slots/slotsync.

  2. Masahiko Sawada (May 8) independently found cross-database snapshot poisoning (REPACK in DB-B could restore a snapshot from DB-A) and demonstrated that need_shared_catalogs=false slots cannot work on standbys.

  3. Antonin Houska produced a deterministic reproducer via injection points, confirming the root cause: RecordTransactionCommit() writes COMMIT WAL before ProcArrayEndTransaction() removes the XID from procarray, allowing LogStandbySnapshot() to capture already-committed XIDs as "still running."

  4. Hayato Kuroda independently confirmed the same mechanism via gdb.

  5. Amit's final position (pre-beta-1): the entire approach is architecturally unsound — logical decoding is designed for cluster-wide transactions, and the patch only partially bypasses that assumption during snapshot construction. He endorsed reverting for PG19.

  6. Alvaro conceded (May 19): insufficient time before beta-1 to address architectural concerns. Committed the revert on May 24 (CI build 5520722497372160).

PG19 Ships With Known Limitation

With the revert, PG19's REPACK CONCURRENTLY uses standard cluster-wide snapshot building. Only one REPACK CONCURRENTLY can build its snapshot at a time per cluster — subsequent ones block. This primarily affects multi-tenant systems. The optimization is deferred to PG20 for holistic redesign.

WAL Accumulation Bug Fix

Hou Zhijie identified that REPACK CONCURRENTLY causes unnecessary WAL file accumulation during operation — the logical replication slot's restart_lsn is not advancing properly. Alvaro confirmed this is a "thinko" and indicated the fix will be pushed for PG19. Given that REPACK can run for hours on large tables, this is a meaningful operational fix to prevent disk exhaustion.

Architectural Context (Established Prior to May)

REPACK CONCURRENTLY's architecture (derived from pg_squeeze):

  • Acquires ShareUpdateExclusiveLock on the target table
  • Background worker runs logical decoding, writing changes to a SharedFileSet
  • Main backend copies old heap to new heap, builds indexes
  • Replays decoded changes onto new heap
  • Upgrades to AccessExclusiveLock and swaps relfilenodes

Known deferred issues for PG20: MVCC safety (tuples carry REPACK's xmin), xmin horizon pinning, lock-upgrade deadlock handling improvements.


2026-06-01 · claude-opus-4-6

WAL Accumulation Fix: Review, Refinement, and Commit

The WAL accumulation bug fix for REPACK CONCURRENTLY progressed through review and was committed.

Amit Kapila's Review (2026-05-28)

Amit provided two substantive review points on Hou's patch:

1. Comment should explain why advancing the slot is safe: Amit notes that manually advancing the replication slot on WAL segment boundaries is only safe because REPACK creates a temporary slot that is dropped if REPACK fails — there's no scenario where the slot needs to restart decoding from an earlier position while still alive. He requests this reasoning be documented in the code comment.

2. ReplicationSlotsComputeRequiredLSN() change is pre-existing bug: The patch moves ReplicationSlotsComputeRequiredLSN() outside the if block so it's called only when updated_restart is true. Amit points out this is a pre-existing issue (not caused by the REPACK patch) and suggests noting it separately in the commit message for clarity.

3. catalog_xmin advancement question: Amit raises a concern about the commit message's explicit deferral of catalog_xmin advancement — asking whether long-running REPACKs will accumulate dead catalog tuples needlessly. He asks if there are ideas to avoid this.

Hou's Response on catalog_xmin

Hou argues this is not a new problem — all long-running commands (CLUSTER, VACUUM FULL, non-concurrent REPACK) that hold snapshots cause the same catalog dead tuple accumulation. Since catalog_xmin only affects system catalog tuples (less harmful than user table bloat), handling it independently is reasonable. He references Antonin's earlier proposal [1] for snapshot resetting during the copy phase as the path forward. Hou submitted v4 with improved comments per Amit's suggestions.

Amit Accepts, Alvaro Commits

Amit agrees the catalog_xmin concern can be handled separately and confirms the code changes look good (without testing). Alvaro pushed the fix on 2026-05-29, split into two parts, with a rewritten comment in decode_concurrent_changes().

Hou's Test Patch

Hou also submitted a TAP test (0002) verifying WAL file removal during REPACK CONCURRENTLY, reporting acceptable complexity and speed (< 1s). It's unclear from the messages whether this test was included in what Alvaro pushed (he says "Pushed 0001, in two parts").