REPACK and WITHOUT OVERLAPS Replica Identity Indexes
Context and Background
This thread addresses a bug in the (hypothetical future) REPACK command when interacting with temporal tables that use WITHOUT OVERLAPS constraints as replica identity. Understanding the bug requires grasping several intersecting PostgreSQL features:
- Temporal tables /
WITHOUT OVERLAPS(SQL:2011, partially implemented in PG 18): These are primary key/unique constraints where one column is a range (or multirange) type, and uniqueness is enforced such that no two rows may have overlapping ranges for the same scalar key. Internally this is implemented via GiST indexes with a mix of equality operators (for scalar columns) and the&&overlap operator (for the range column). - Replica identity indexes: Used by logical replication (and related machinery) to locate the tuple corresponding to a remote change. A
WITHOUT OVERLAPSindex is legally usable as a replica identity since it enforces a form of uniqueness. - REPACK: A command (presumed to be the in-core version built on top of the
pg_repackextension's technique) that rebuilds a table's storage while allowing concurrent DML, using an identity index to match new-heap tuples against change-capture rows.
The structural overlap between REPACK and logical replication here is important: both need to take a tuple and locate "the same row" in a target relation using the identity index. Logical replication already solved this generically via build_replindex_scan_key(), which went through IndexAmTranslateCompareType(COMPARE_EQ) rather than hard-coding B-tree strategy numbers.
The Core Problem
Two distinct bugs are reported, both stemming from REPACK assuming a B-tree-shaped identity index:
Bug 1: Hard-coded BTEqualStrategyNumber
REPACK calls get_opfamily_member(opfamily, lefttype, righttype, BTEqualStrategyNumber) when constructing the scan key used to locate tuples via the identity index. This is wrong for two reasons:
- Strategy numbers are AM-specific. GiST uses a completely different strategy numbering scheme.
BTEqualStrategyNumber(3) does not mean equality in GiST; it refers to whatever operator is registered at strategy 3 in that opclass (often&&for range opclasses, or something else entirely). - For
WITHOUT OVERLAPSindexes the "equality" for the range column is actually the overlap operator&&. The index is a GiST index whose semantics for identity purposes combine per-column equality on the scalar key columns with range overlap on the temporal column. The canonical API for obtaining the right operator by semantic compare type isIndexAmTranslateCompareType(amoid, COMPARE_EQ), which returns the AM-specific strategy number corresponding to equality semantics.
The fix is to replace the hard-coded BTEqualStrategyNumber with a call to IndexAmTranslateCompareType(), mirroring exactly what build_replindex_scan_key() in execReplication.c does. This aligns REPACK's identity lookup with the established logical replication code path.
Bug 2: Missing recheck for lossy index scans
GiST on multirange types can return lossy matches: the index returns candidate TIDs that satisfy a superset of the predicate, and the executor must apply index_rescan-style recheck (or a heap-side recheck using the original qual) to filter false positives. B-tree equality is never lossy, so REPACK's find_target_tuple() was written to trust the first tuple returned by the index scan.
With multirange overlap semantics, the index can hand back a tuple whose range does not actually overlap the target range; REPACK would then operate on the wrong row, corrupting the repacked heap. The fix requires performing the recheck: either honoring xs_recheck from IndexScanDescData and re-evaluating the scan keys against the heap tuple, or switching to an executor-level scan that handles recheck natively.
Design Implications
The deeper insight here is that REPACK should not reinvent identity-index matching. The logical replication subsystem (execReplication.c) has already encoded the correct abstractions:
RelationFindReplTupleByIndex()handles AM-agnostic equality viaIndexAmTranslateCompareType.- It naturally accommodates recheck because it uses the executor's index-scan infrastructure.
A more ambitious fix would refactor REPACK to call into shared helpers rather than duplicating scan-key construction. The submitted patch takes the narrower, surgical approach: fix the strategy-number lookup and add recheck at the REPACK call site. This is defensible for a bug fix but leaves a latent duplication hazard — any future AM-compare-type semantics (e.g., COMPARE_OVERLAPS being treated specially) would need to be patched in two places.
Reviewer Feedback (Kirill Reshke)
Reshke's response is brief but raises two substantive points:
-
CHECK_FOR_INTERRUPTS(): REPACK loops over potentially very large heaps. If the fix adds a recheck loop (skipping false-positive tuples until a true match is found), that inner loop becomes a new place where a runaway scan could ignore query cancellation. Adding a
CFI()call in the recheck loop is good defensive practice and consistent with other long-running scan loops in the backend. -
Test strategy — isolation vs. regression: The patch ships
.specfiles (isolation tests) with injection points. Reshke questions whether a plain regression test would exercise the same code paths. This is a legitimate concern:- Isolation tests plus injection points are heavyweight and increase test runtime and maintenance burden.
- However, REPACK is specifically about concurrent DML during the rebuild; the very bug surface (identity matching against changes captured during REPACK) only manifests when another session performs DML mid-rebuild. A plain regression test that just runs
REPACKon a static table would not exercisefind_target_tuple()at all, because there would be no change records to apply. - Injection points are likely needed to deterministically interleave the REPACK phases with the concurrent session's modifications.
The correct answer is probably that isolation tests are needed, but the patch author should justify this explicitly in the commit message.
Architectural Significance
This bug is a canonical example of what happens when a feature (temporal WITHOUT OVERLAPS) expands the set of valid replica identity indexes beyond B-tree, and pre-existing consumers of "the replica identity index" haven't been audited for B-tree assumptions. The same class of bug likely exists — or existed — in other places that accept a replica identity index but use BTEqualStrategyNumber directly. A grep for BTEqualStrategyNumber in code paths that accept arbitrary relreplident indexes would be a prudent follow-up audit.
The episode also underscores why IndexAmTranslateCompareType was introduced: it is the AM abstraction boundary for "equality-like" operator lookup, and any code that bypasses it is a latent bug for non-B-tree identity indexes.