Richard Guo's PGConf.dev Synthesis and Formal Options Enumeration (v5)
Richard returns from PGConf.dev with a significant message that restructures the entire discussion. Rather than defending the lock-based predicate further, he steps back to present a neutral taxonomy of all possible approaches to the trigger-gap problem, explicitly soliciting community input. This is a deliberate widening of the design space after in-person conversations revealed "concerns" with approach (B).
Key new technical content
1. Concrete demonstration of the trigger gap on unpatched master
Richard provides a fully self-contained reproduction script (the show_gap() function called from DELETE ... RETURNING) that proves the trigger gap is observable today on stock PostgreSQL with no patch applied. This is pedagogically important: prior messages described the gap abstractly, but this is the first time the thread contains a copy-paste-runnable proof. The demonstration uses PL/pgSQL's SPI snapshot inside a RETURNING clause — a path that any reviewer can verify in seconds.
2. Six enumerated approaches (A through F)
This is the first time all options have been laid out in a single message with explicit pros/cons:
- (A) Abandon the optimization entirely — the null option.
- (B) Lock-based predicate (current v4/v5-0002) — Richard now explicitly acknowledges the "replan storm" weakness:
choose_custom_plantrips on every invocation within a writing transaction, forcing replans even when the FK invariant actually holds for the current statement's snapshot. This is a new, previously-unmentioned cost of (B) that goes beyond the "false positive" discussion in prior analyses. - (C) Snapshot-anchored predicate — a new approach not previously discussed in the thread. The idea is to stamp each
SnapshotDatawith a bit indicating whether it was captured whileAfterTriggerPendingOnAnyRel()was true. This is maximally precise (only gap-born snapshots suppress the optimization) but contaminatessnapshot.hwith trigger-subsystem knowledge. - (D) Close the gap entirely — make FK enforcement synchronous or atomic. Richard acknowledges this would be the ideal long-term fix but flags it as "invasive and difficult."
- (E) Accept wrong results, document them — essentially a pragmatic "ship it broken" option for corner cases.
- (F) Something else — open-ended invitation.
3. Patch split into two parts (v5)
v5 is now structured as:
- 0001: Pure structural inner-join-removal logic, assuming FK invariant holds unconditionally. Includes the TABLESAMPLE fix from Alex's review.
- 0002: The lock-based predicate (option B) as a separable safety layer.
Richard explicitly states 0001 is not safe to commit alone — the split is for review ergonomics only. This is a mature engineering decision that allows the community to evaluate the optimization's correctness independently of the gap-handling strategy.
4. The "replan storm" problem (newly articulated)
Richard identifies a performance pathology in option (B) that was not previously discussed: for cached plans that benefited from FK removal, choose_custom_plan() will detect the RowExclusiveLock and force a custom plan on every execution within the writing transaction. This isn't a one-time replan — it's a per-execution overhead that persists for the transaction's lifetime. For connection-pooled OLTP workloads where transactions mix reads and writes, this could make the optimization actively harmful to latency variance.
What this message signals about project trajectory
Richard's neutral framing ("I'm not advocating any particular option") and the CC of PGConf.dev conversants suggests the lock-based predicate may not survive as the final approach. The snapshot-anchored predicate (C) is the most technically interesting new proposal — it trades architectural purity of SnapshotData for precision, eliminating both false positives and the replan storm. The thread is now at a decision point where committer input on architectural acceptability will likely determine direction.
Still unaddressed
- Tender Wang's iteration/phase-ordering question (inner removal unlocking subsequent left removal) — no response.
- The broader question of whether similar scan-level filters beyond TABLESAMPLE need guarding (foreign tables with LIMIT pushdown, etc.).