Inner Join Removal via Foreign Keys — Architectural Analysis
The Core Problem
PostgreSQL's planner has long supported left join removal: if the RHS of a LEFT JOIN is guaranteed to match at most one row and nothing above the join references the RHS, the join can be discarded (join_is_removable() in analyzejoins.c). The asymmetry is that for an inner join, correctness requires a stronger guarantee: every LHS row must match exactly one RHS row (not zero, not more). This two-sided guarantee is precisely what a validated FOREIGN KEY plus a UNIQUE/PRIMARY KEY on the referenced side gives you — modulo NULLs in the referencing columns (which the proposed patch handles by synthesizing IS NOT NULL filters on any FK columns not declared NOT NULL).
The motivation is not purely cosmetic. Richard Guo explicitly ties this to SQL/PGQ property graph queries (the new GRAPH_TABLE / MATCH machinery). A k-hop graph pattern expands into a chain of edge-table joins interleaved with vertex-table lookups; FK-based removal collapses a 7-way join into a 3-way join in his example. Under the graph-query workload this is not a micro-optimization — it changes the planner's asymptotic behavior on patterns of realistic size.
A secondary, more subtle optimization mentioned by Richard is that once you know the FK relationship holds, the referenced PK columns become interchangeable with the FK columns in EquivalenceClasses even above the removed join. That is what allows multi-hop "bridging" through an intermediate vertex table to collapse — not just the immediate join.
Why This Was a Dead End in 2014 — the RI Trigger Gap
David Rowley (who attempted the same optimization roughly a decade earlier) immediately identified the fatal objection: PostgreSQL's referential integrity is not enforced by the storage engine; it is enforced by AFTER ROW triggers that fire at end-of-statement (or later, if the constraint is DEFERRED). That creates an intra-statement window — Richard christens it the "T2 Gap" — where:
- The PK row has already been marked dead by setting xmax,
- The RI cascade trigger is queued but has not fired,
- The dependent FK row in the child table is still live and visible.
During this window, user-visible code can run: RETURNING expressions, volatile functions in the DML, user-written AFTER ROW triggers firing before the RI trigger, etc. Any such code executing a query that relied on the FK invariant for plan correctness would observe a child row with no parent, and join removal would yield wrong answers. Because the planner operates on static schema guarantees and has no visibility into the deferred-trigger queue, a purely plan-time approach appeared unsalvageable. Heikki's 2014 message (ref [3] from David) is the canonical statement of this problem.
This is architecturally important: it means FK constraints in PostgreSQL are a weaker semantic guarantee than UNIQUE or NOT NULL, which are enforced synchronously at the heap/btree level. Richard concedes he had been mentally lumping FKs in with those, which was the source of his initial misstep.
Richard's Proposed Escape Hatch: Lock-Manager Probing
Rather than abandoning the optimization, Richard proposes a novel runtime predicate: use the local lock manager as a proxy for "could the trigger gap be open on this relation?" The reasoning:
- Any DML on a table acquires
RowExclusiveLock before modifying rows.
RowExclusiveLock is held until end-of-transaction.
- Therefore, if no backend-local
RowExclusiveLock exists on the referencing table, no trigger gap can be active in this transaction, and FK-based removal is safe.
The check is implemented via CheckRelationOidLockedByMe(), which is cheap (local hash lookup, no shared-lock-table traffic). There is precedent for the planner consulting local lock state, so this is not an architectural violation.
For cached plans, the predicate cannot be baked into the generic plan because lock state is dynamic. Richard therefore modifies choose_custom_plan() so that if any relation involved in an FK-removal currently holds RowExclusiveLock in this backend, the planner falls back to a custom plan (which won't have performed the removal). This is the correct integration point — it is exactly analogous to how partition pruning interacts with plan caching.
The Snapshot-Lifetime Refinement (v-rebase, April 28)
Richard's April 28 message contains the most subtle piece of reasoning in the thread. He initially believed the predicate only needed to hold during the trigger-gap window itself. Tom Lane's 2015 observation (cited as [1] in that message) forced a reconsideration: a snapshot captured inside the gap can outlive the gap. Concretely, a STABLE function invoked from within the DML inherits the caller's snapshot; if that function later executes a query planned with FK-removal, it will see the inconsistency even though the gap has technically closed.
Richard evaluates and rejects several tighter predicates — a per-statement counter in ExecInitModifyTable, AfterTriggerPendingOnRel() — because each of them goes false before all potentially-affected snapshots have been released. The lock-based predicate survives precisely because RowExclusiveLock is transaction-scoped, which by construction outlives any in-transaction snapshot. The false-positive cost (the optimization remains disabled for the rest of the transaction after any DML, after ROLLBACK TO SAVEPOINT, or when a user has taken LOCK TABLE) is the price paid for a provably sound lifetime bound.
This is a genuinely clever argument. Whether committers accept it will hinge on (a) whether they believe there is no fourth class of snapshot-outliving-lock that Richard hasn't enumerated, and (b) whether the false-positive rate in real workloads is tolerable given that mixed read/write transactions are extremely common.
A Second Bug Found During Review — Over-Conservative joininfo Check
Tender Wang's April 30 review exposes a planner subtlety unrelated to the trigger-gap question. With the test query:
SELECT n.* FROM nation n LEFT JOIN (orders o JOIN users u ON o.user_id = u.id) ON n.id = o.id;
the inner join o JOIN u is not removed, even though it is demonstrably removable. Root cause: inner_join_is_removable() bails if ref_rel->joininfo is non-empty. But the clause sitting in users.joininfo here is n.id = o.id — which does not reference users at all. It appears there because it is a non-degenerate outer-join ON clause, and reconsider_outer_join_clauses() (via required_relids) pins such clauses to the minimum input relid set of the outer join to prevent them being evaluated at the wrong syntactic level. Richard confirms this is a known conservatism and the May 1 patch tightens the test to check clause_relids rather than list emptiness.
This is a useful reminder that joininfo membership does not imply referential dependency — a distinction that comes up repeatedly in analyzejoins.c work.
Design Tensions and Open Questions
-
False positives vs. simplicity. The lock-based predicate means any transaction that has done even a single unrelated DML, or taken an explicit table lock, loses the optimization for the duration. For OLTP workloads with BEGIN; UPDATE ...; SELECT ...; COMMIT patterns, this may negate most of the benefit. No one has yet quantified this.
-
Is the lock-lifetime argument airtight? Richard's argument rests on "every in-transaction snapshot has been released by the time RowExclusiveLock drops." This is true for normal snapshots, but cursors with WITH HOLD, some logical-decoding paths, and cross-transaction snapshot export (pg_export_snapshot) could stress the invariant. The thread hasn't probed these.
-
Executor-level alternative. Richard acknowledges David's earlier executor-level attempts but notes they were too invasive. The planner-side approach with a lock-manager guard is the first credible middle path.
-
Interaction with REFERENCES ... ON DELETE NO ACTION DEFERRABLE INITIALLY DEFERRED. Deferred constraints extend the "gap" to end-of-transaction. The lock-based predicate handles this automatically (the lock is still held), which is a nice property of the design, but it further widens the false-positive envelope.
Significance
If the lock-manager guard withstands scrutiny, this would be the first time PostgreSQL has successfully weaponized FK constraints for plan-time join elimination — a capability that competing systems (SQL Server, Oracle) have had for years but that PostgreSQL's trigger-based RI has historically blocked. The graph-query motivation makes the timing particularly apt, since SQL/PGQ workloads are the first class of queries where the optimization moves from "nice" to "essential."