Monthly Summary: Join Statistics for the PostgreSQL Planner (May 2026)
Overview
May 2026 saw the "join statistics" patch transition from conceptual design convergence to active committer-level review, with rapid iteration through patch versions v5–v7 and substantive architectural feedback from Tom Lane. The thread matured from "is this feasible?" to "what exactly should the catalog look like and what are the remaining blockers before commit?"
Key Developments
Patch Stabilization (v5–v7)
Alexandra posted three rapid revisions addressing bugs identified by Jian He (DDL interactions with join statistics) and CI platform instabilities:
- v5: Fixed four DDL bugs (self-join + ALTER TYPE, cross-table ALTER TYPE, generated column ALTER EXPRESSION, whole-row variables in join quals). Dropped
catversion.hchanges to reduce rebase churn. - v6/v7: Fixed platform-dependent test output (NetBSD/Windows) by changing test types and stabilizing regression tests.
Tom Lane's Architectural Review (Major Milestone)
Tom provided the first detailed committer-level design review since the thread's early exchanges:
- Scope endorsement: Explicitly blessed 2-way-only and index-sampling-only restrictions for v1, provided the catalog is N-way ready. This removes ambiguity about commit requirements.
- Catalog simplification: Proposed including the anchor relation's OID as the first element of
stxjoinrels[]to eliminate special-casing in expression varno resolution. - Eliminate
stxkeys[]: Floated replacing the dual-trackstxkeys[]/stxexprsrepresentation with expression trees for all target columns (simple Vars as Var nodes). This would also eliminatestxkeyrefs[]. - Permissions gap flagged: Current same-owner requirement is unworkable; a concrete permissions model is a hard requirement before commit.
- Documentation requested as a design-clarification exercise.
Permissions Model Convergence
Both Tom and Tomas weighed in with concrete proposals:
- Tomas:
stxownershould have SELECT privilege on joined relations (not ownership). - Tom: Ownership on the anchor table, SELECT on the rest. Partially dismissed information-leakage concerns since users who benefit from join stats already have SELECT on the relevant tables. Flagged SELECT revocation as an unresolved edge case.
New Reviewer: Lifecycle Semantics Questions (Chengpeng Yan)
Three substantive questions that sharpened previously-open issues:
- Index dependency contradiction:
CREATE STATISTICSblocksDROP INDEXvia catalog dependency, butANALYZEre-discovers indexes at runtime. Is the index part of the persistent contract or merely a creation-time proof? - Cross-table staleness: Join stats owned by the anchor relation become stale when the probed table changes. Concrete failure scenario for the ANALYZE-trigger problem.
- MCV frequency semantics for non-unique joins: When one anchor row produces multiple joined rows (non-FK joins),
raw_selnormalization diverges. Correctness concern for non-FK joins not previously articulated.
Tomas Vondra: N-way Is Important
Tomas strengthened his position that N-way support is not merely nice-to-have, citing fact-table-joining-two-correlated-dimensions as a concrete case that 2-way stats cannot handle. He endorses enforcing index requirements at DDL time rather than silently failing during ANALYZE.
Current State
The patch is architecturally sound but has several concrete blockers before commit consideration:
- Permissions model needs implementation
- Catalog representation may be simplified per Tom's suggestions (stxkeys elimination)
- Index dependency lifecycle semantics need resolution
- Documentation required
- N-way catalog readiness needs verification against Tom's simplification proposals
The measurement results remain strong: geometric mean q-error improvement from 103× to 4.2× on the keyword/movie_keyword join, JOB median warm runtime 114s → 74s (1.54× speedup).