2026-06-01 · claude-opus-4-6

Key Joins: A New SQL Language Feature for PostgreSQL

Core Problem

The fundamental problem addressed by this proposal is that SQL's JOIN syntax is semantically overloaded — a JOIN can serve three fundamentally different purposes, and the syntax gives no indication which is intended:

Enrichment (lookup): Adding columns from a referenced table along a foreign key, preserving cardinality of the referencing side
Filtering: Reducing rows when the referenced side has no match
Fan-out: Multiplying rows when the join target has duplicates

This ambiguity is not merely cosmetic — it leads to silent correctness bugs. The canonical example is double-counting in aggregates when multiple 1-to-many joins are composed:

SELECT o.id, SUM(oi.amount), SUM(p.amount)
FROM orders o
LEFT JOIN order_items oi ON oi.order_id = o.id
LEFT JOIN payments    p  ON p.order_id  = o.id
GROUP BY o.id;

Here, both order_items and payments can fan out orders, silently inflating both SUMs. This is a well-known SQL pitfall (referenced from a pgsql-general thread). The problem is that the query author intended enrichment joins, but the database cannot verify that intent.

Proposed Solution: `FOR KEY` Join Syntax

The patch introduces a new join qualifier syntax:

a LEFT JOIN b FOR KEY (referencing_cols) -> a (referenced_cols)

The -> arrow indicates direction: from the referencing table (the one being joined) to the referenced table (whose rows must remain unique in context). The ASCII arrow syntax was chosen over FROM/TO keywords to avoid confusion with SQL's FROM clause, and aligns with SQL/PGQ's graph pattern syntax which also uses ASCII arrows.

Compile-Time Proof System

The key innovation is a static proof checker that runs at parse/analysis time. When FOR KEY is specified, the system must prove at compile time that:

Row coverage: The referencing columns cover the foreign key
Referenced uniqueness: The referenced columns are provably unique in context (considering preceding joins that may have duplicated rows)
Not-null guarantees: As needed for the proof to hold

If any of these cannot be proven from the schema (constraints, keys, preceding join structure), the query fails with a compile-time error. This is the critical semantic guarantee — the error in the motivating example fires because o.id is not provably unique after a preceding 1-to-many join has already fanned it out.

Proof Fact Propagation

The proof system tracks facts through query structure:

Uniqueness facts propagate through simple GROUP BY and DISTINCT
Row-coverage facts can pass through GROUPING SETS, ROLLUP, and CUBE when a grouping set contains all key columns under the same key identity (revised in v7)
Facts do NOT propagate through subqueries or CTEs in the general case (necessitating the column-list syntax over constraint-name references)

Dependency Tracking

Views using FOR KEY joins depend on the constraints that form their proof. A new dependency type (deptype) is introduced to track which constraints a view's proof relies upon. If those constraints are dropped, the view becomes invalid. This is exposed via a new information_schema.view_constraint_usage view (patch 0003).

Architecture and Design Decisions

Why Column Lists Instead of Constraint Names

The proposal uses explicit column lists (FOR KEY (col1, col2) -> alias (col3, col4)) rather than named foreign key constraints. This is deliberate because:

Constraint names are not universally applicable (CTEs, subqueries, self-joins, multiple references to same table)
Column lists work uniformly in all contexts
The proof checker validates against the schema regardless

Matthias van de Meent pushed back on this, arguing that named constraints should be supported as syntactic sugar for the common case. This remains an open design question.

View Expansion During Parse

A notable implementation challenge: views must be expanded during parse analysis for proof checking, since the proof needs to see through view definitions to find underlying constraints. The authors acknowledge this is imperfect and seek feedback.

Concurrency: Serializing Routine Changes with Dependency Recording

Patch 0001 addresses a pre-existing concurrency bug (from the "Parallel INSERT SELECT" thread) where DROP FUNCTION or ALTER FUNCTION can race with dependency lookups. For key joins, this extends further because proof facts depend on functions (e.g., functional dependencies, expression indexes).

The solution: lock referenced procedures before recording dependencies, and have CREATE OR REPLACE FUNCTION / ALTER FUNCTION take conflicting locks before modifying pg_proc. This was reworked after commits 2fbb211 (generic dependency locking) and e2b3573 landed on master.

Revalidation Logic

Key join proofs must be revalidated when underlying schema changes (constraint drops, table alterations). The dependency tracking via the new deptype enables this, but the revalidation logic is acknowledged as complex and needing community review.

Performance: Zero Overhead for Non-Users

The proof-checking logic is designed to impose no overhead on queries that don't use FOR KEY. This is explicitly called out as a design goal, though it adds implementation complexity.

Patch Structure

Patch	Purpose
0001	Serialize routine definition changes with dependency recording (concurrency fix)
0002	Core `FOR KEY` join implementation: parser, proof checker, dependency tracking, revalidation
0003	`information_schema.view_constraint_usage` — exposes proof dependencies

Syntax Discussion

The arrow syntax (-> and <-) was debated:

Laurenz Albe suggested FOR KEY FROM (col1) TO (col2) or FOR KEY (col1) TO (col2) for more SQL-like appearance
Joel Jacobson explained that FROM in a join clause creates parsing ambiguity with the FROM clause, and that three years of user feedback favored arrows
The precedent of SQL/PGQ using ASCII arrows was cited
Laurenz accepted the arrow approach

Standards Process

This is being submitted as a Change Proposal to ISO/IEC WG 3 (SQL committee) for the June 2026 meeting in Stockholm. The PostgreSQL implementation serves as a reference implementation, with a web demo at keyjoin.org using patched PGLite for browser-based experimentation.

Testing and Stabilization

The patch includes extensive testing:

A "massive test suite" (acknowledged as needing trimming for committability)
Injection-point isolation tests for concurrency/race conditions
ICU-dependent tests separated into guarded test files
Multiple iterations (v1-v8) to fix build issues across platforms (NetBSD/FreeBSD), assertion-only variable warnings, regression test naming conventions, and nondeterministic isolation test ordering

Open Questions and Future Work

Revalidation logic correctness — needs deeper community review
Constraint name syntax — should named FKs be supported as sugar?
Test suite trimming — current suite is too large for commit
View expansion during parse — acknowledged as needing a better solution
Optimizer implications — if the system proves a join is 1:1, the optimizer could potentially use this for better plans (not discussed in this thread but an obvious follow-on)

Key joins

Latest Update

Incremental Update: Errata Fix to Key Join Paper (Section 7.4.13)

What happened:

Technical note on the corrected example:

Key Joins: A New SQL Language Feature for PostgreSQL

Core Problem

Proposed Solution: `FOR KEY` Join Syntax

Compile-Time Proof System

Proof Fact Propagation

Dependency Tracking

Architecture and Design Decisions

Why Column Lists Instead of Constraint Names

View Expansion During Parse

Concurrency: Serializing Routine Changes with Dependency Recording

Revalidation Logic

Performance: Zero Overhead for Non-Users

Patch Structure

Syntax Discussion

Standards Process

Testing and Stabilization

Open Questions and Future Work

Key joins

Latest Update

Incremental Update: Errata Fix to Key Join Paper (Section 7.4.13)

What happened:

Technical note on the corrected example:

Key Joins: A New SQL Language Feature for PostgreSQL

Core Problem

Proposed Solution: FOR KEY Join Syntax

Compile-Time Proof System

Proof Fact Propagation

Dependency Tracking

Architecture and Design Decisions

Why Column Lists Instead of Constraint Names

View Expansion During Parse

Concurrency: Serializing Routine Changes with Dependency Recording

Revalidation Logic

Performance: Zero Overhead for Non-Users

Patch Structure

Syntax Discussion

Standards Process

Testing and Stabilization

Open Questions and Future Work

Proposed Solution: `FOR KEY` Join Syntax