Key joins

First seen: 2026-05-28 18:47:50+00:00 · Messages: 12 · Participants: 3

Latest Update

2026-06-01 · claude-opus-4-6

Key Joins: A New SQL Language Feature for PostgreSQL

Core Problem

The fundamental problem addressed by this proposal is that SQL's JOIN syntax is semantically overloaded — a JOIN can serve three fundamentally different purposes, and the syntax gives no indication which is intended:

  1. Enrichment (lookup): Adding columns from a referenced table along a foreign key, preserving cardinality of the referencing side
  2. Filtering: Reducing rows when the referenced side has no match
  3. Fan-out: Multiplying rows when the join target has duplicates

This ambiguity is not merely cosmetic — it leads to silent correctness bugs. The canonical example is double-counting in aggregates when multiple 1-to-many joins are composed:

SELECT o.id, SUM(oi.amount), SUM(p.amount)
FROM orders o
LEFT JOIN order_items oi ON oi.order_id = o.id
LEFT JOIN payments    p  ON p.order_id  = o.id
GROUP BY o.id;

Here, both order_items and payments can fan out orders, silently inflating both SUMs. This is a well-known SQL pitfall (referenced from a pgsql-general thread). The problem is that the query author intended enrichment joins, but the database cannot verify that intent.

Proposed Solution: FOR KEY Join Syntax

The patch introduces a new join qualifier syntax:

a LEFT JOIN b FOR KEY (referencing_cols) -> a (referenced_cols)

The -> arrow indicates direction: from the referencing table (the one being joined) to the referenced table (whose rows must remain unique in context). The ASCII arrow syntax was chosen over FROM/TO keywords to avoid confusion with SQL's FROM clause, and aligns with SQL/PGQ's graph pattern syntax which also uses ASCII arrows.

Compile-Time Proof System

The key innovation is a static proof checker that runs at parse/analysis time. When FOR KEY is specified, the system must prove at compile time that:

  1. Row coverage: The referencing columns cover the foreign key
  2. Referenced uniqueness: The referenced columns are provably unique in context (considering preceding joins that may have duplicated rows)
  3. Not-null guarantees: As needed for the proof to hold

If any of these cannot be proven from the schema (constraints, keys, preceding join structure), the query fails with a compile-time error. This is the critical semantic guarantee — the error in the motivating example fires because o.id is not provably unique after a preceding 1-to-many join has already fanned it out.

Proof Fact Propagation

The proof system tracks facts through query structure:

Dependency Tracking

Views using FOR KEY joins depend on the constraints that form their proof. A new dependency type (deptype) is introduced to track which constraints a view's proof relies upon. If those constraints are dropped, the view becomes invalid. This is exposed via a new information_schema.view_constraint_usage view (patch 0003).

Architecture and Design Decisions

Why Column Lists Instead of Constraint Names

The proposal uses explicit column lists (FOR KEY (col1, col2) -> alias (col3, col4)) rather than named foreign key constraints. This is deliberate because:

Matthias van de Meent pushed back on this, arguing that named constraints should be supported as syntactic sugar for the common case. This remains an open design question.

View Expansion During Parse

A notable implementation challenge: views must be expanded during parse analysis for proof checking, since the proof needs to see through view definitions to find underlying constraints. The authors acknowledge this is imperfect and seek feedback.

Concurrency: Serializing Routine Changes with Dependency Recording

Patch 0001 addresses a pre-existing concurrency bug (from the "Parallel INSERT SELECT" thread) where DROP FUNCTION or ALTER FUNCTION can race with dependency lookups. For key joins, this extends further because proof facts depend on functions (e.g., functional dependencies, expression indexes).

The solution: lock referenced procedures before recording dependencies, and have CREATE OR REPLACE FUNCTION / ALTER FUNCTION take conflicting locks before modifying pg_proc. This was reworked after commits 2fbb211 (generic dependency locking) and e2b3573 landed on master.

Revalidation Logic

Key join proofs must be revalidated when underlying schema changes (constraint drops, table alterations). The dependency tracking via the new deptype enables this, but the revalidation logic is acknowledged as complex and needing community review.

Performance: Zero Overhead for Non-Users

The proof-checking logic is designed to impose no overhead on queries that don't use FOR KEY. This is explicitly called out as a design goal, though it adds implementation complexity.

Patch Structure

Patch Purpose
0001 Serialize routine definition changes with dependency recording (concurrency fix)
0002 Core FOR KEY join implementation: parser, proof checker, dependency tracking, revalidation
0003 information_schema.view_constraint_usage — exposes proof dependencies

Syntax Discussion

The arrow syntax (-> and <-) was debated:

Standards Process

This is being submitted as a Change Proposal to ISO/IEC WG 3 (SQL committee) for the June 2026 meeting in Stockholm. The PostgreSQL implementation serves as a reference implementation, with a web demo at keyjoin.org using patched PGLite for browser-based experimentation.

Testing and Stabilization

The patch includes extensive testing:

Open Questions and Future Work

  1. Revalidation logic correctness — needs deeper community review
  2. Constraint name syntax — should named FKs be supported as sugar?
  3. Test suite trimming — current suite is too large for commit
  4. View expansion during parse — acknowledged as needing a better solution
  5. Optimizer implications — if the system proves a join is 1:1, the optimizer could potentially use this for better plans (not discussed in this thread but an obvious follow-on)