Key Joins: A New SQL Language Feature for PostgreSQL
Core Problem
The fundamental problem addressed by this proposal is that SQL's JOIN syntax is semantically overloaded — a JOIN can serve three fundamentally different purposes, and the syntax gives no indication which is intended:
- Enrichment (lookup): Adding columns from a referenced table along a foreign key, preserving cardinality of the referencing side
- Filtering: Reducing rows when the referenced side has no match
- Fan-out: Multiplying rows when the join target has duplicates
This ambiguity is not merely cosmetic — it leads to silent correctness bugs. The canonical example is double-counting in aggregates when multiple 1-to-many joins are composed:
SELECT o.id, SUM(oi.amount), SUM(p.amount)
FROM orders o
LEFT JOIN order_items oi ON oi.order_id = o.id
LEFT JOIN payments p ON p.order_id = o.id
GROUP BY o.id;
Here, both order_items and payments can fan out orders, silently inflating both SUMs. This is a well-known SQL pitfall (referenced from a pgsql-general thread). The problem is that the query author intended enrichment joins, but the database cannot verify that intent.
Proposed Solution: FOR KEY Join Syntax
The patch introduces a new join qualifier syntax:
a LEFT JOIN b FOR KEY (referencing_cols) -> a (referenced_cols)
The -> arrow indicates direction: from the referencing table (the one being joined) to the referenced table (whose rows must remain unique in context). The ASCII arrow syntax was chosen over FROM/TO keywords to avoid confusion with SQL's FROM clause, and aligns with SQL/PGQ's graph pattern syntax which also uses ASCII arrows.
Compile-Time Proof System
The key innovation is a static proof checker that runs at parse/analysis time. When FOR KEY is specified, the system must prove at compile time that:
- Row coverage: The referencing columns cover the foreign key
- Referenced uniqueness: The referenced columns are provably unique in context (considering preceding joins that may have duplicated rows)
- Not-null guarantees: As needed for the proof to hold
If any of these cannot be proven from the schema (constraints, keys, preceding join structure), the query fails with a compile-time error. This is the critical semantic guarantee — the error in the motivating example fires because o.id is not provably unique after a preceding 1-to-many join has already fanned it out.
Proof Fact Propagation
The proof system tracks facts through query structure:
- Uniqueness facts propagate through simple
GROUP BYandDISTINCT - Row-coverage facts can pass through
GROUPING SETS,ROLLUP, andCUBEwhen a grouping set contains all key columns under the same key identity (revised in v7) - Facts do NOT propagate through subqueries or CTEs in the general case (necessitating the column-list syntax over constraint-name references)
Dependency Tracking
Views using FOR KEY joins depend on the constraints that form their proof. A new dependency type (deptype) is introduced to track which constraints a view's proof relies upon. If those constraints are dropped, the view becomes invalid. This is exposed via a new information_schema.view_constraint_usage view (patch 0003).
Architecture and Design Decisions
Why Column Lists Instead of Constraint Names
The proposal uses explicit column lists (FOR KEY (col1, col2) -> alias (col3, col4)) rather than named foreign key constraints. This is deliberate because:
- Constraint names are not universally applicable (CTEs, subqueries, self-joins, multiple references to same table)
- Column lists work uniformly in all contexts
- The proof checker validates against the schema regardless
Matthias van de Meent pushed back on this, arguing that named constraints should be supported as syntactic sugar for the common case. This remains an open design question.
View Expansion During Parse
A notable implementation challenge: views must be expanded during parse analysis for proof checking, since the proof needs to see through view definitions to find underlying constraints. The authors acknowledge this is imperfect and seek feedback.
Concurrency: Serializing Routine Changes with Dependency Recording
Patch 0001 addresses a pre-existing concurrency bug (from the "Parallel INSERT SELECT" thread) where DROP FUNCTION or ALTER FUNCTION can race with dependency lookups. For key joins, this extends further because proof facts depend on functions (e.g., functional dependencies, expression indexes).
The solution: lock referenced procedures before recording dependencies, and have CREATE OR REPLACE FUNCTION / ALTER FUNCTION take conflicting locks before modifying pg_proc. This was reworked after commits 2fbb211 (generic dependency locking) and e2b3573 landed on master.
Revalidation Logic
Key join proofs must be revalidated when underlying schema changes (constraint drops, table alterations). The dependency tracking via the new deptype enables this, but the revalidation logic is acknowledged as complex and needing community review.
Performance: Zero Overhead for Non-Users
The proof-checking logic is designed to impose no overhead on queries that don't use FOR KEY. This is explicitly called out as a design goal, though it adds implementation complexity.
Patch Structure
| Patch | Purpose |
|---|---|
| 0001 | Serialize routine definition changes with dependency recording (concurrency fix) |
| 0002 | Core FOR KEY join implementation: parser, proof checker, dependency tracking, revalidation |
| 0003 | information_schema.view_constraint_usage — exposes proof dependencies |
Syntax Discussion
The arrow syntax (-> and <-) was debated:
- Laurenz Albe suggested
FOR KEY FROM (col1) TO (col2)orFOR KEY (col1) TO (col2)for more SQL-like appearance - Joel Jacobson explained that
FROMin a join clause creates parsing ambiguity with theFROMclause, and that three years of user feedback favored arrows - The precedent of SQL/PGQ using ASCII arrows was cited
- Laurenz accepted the arrow approach
Standards Process
This is being submitted as a Change Proposal to ISO/IEC WG 3 (SQL committee) for the June 2026 meeting in Stockholm. The PostgreSQL implementation serves as a reference implementation, with a web demo at keyjoin.org using patched PGLite for browser-based experimentation.
Testing and Stabilization
The patch includes extensive testing:
- A "massive test suite" (acknowledged as needing trimming for committability)
- Injection-point isolation tests for concurrency/race conditions
- ICU-dependent tests separated into guarded test files
- Multiple iterations (v1-v8) to fix build issues across platforms (NetBSD/FreeBSD), assertion-only variable warnings, regression test naming conventions, and nondeterministic isolation test ordering
Open Questions and Future Work
- Revalidation logic correctness — needs deeper community review
- Constraint name syntax — should named FKs be supported as sugar?
- Test suite trimming — current suite is too large for commit
- View expansion during parse — acknowledged as needing a better solution
- Optimizer implications — if the system proves a join is 1:1, the optimizer could potentially use this for better plans (not discussed in this thread but an obvious follow-on)