Subquery pull-up increases jointree search space

First seen: 2026-02-09 17:22:17+00:00 · Messages: 5 · Participants: 3

Latest Update

2026-05-11 · opus 4.7

Subquery Pull-Up vs. Join Collapse Limits: A Planner Search-Space Problem

The Core Architectural Problem

PostgreSQL's planner has long used two escape hatches — from_collapse_limit and join_collapse_limit (both defaulting to 8) — to bound the combinatorial explosion of make_rel_from_joinlist's dynamic programming search. When the number of base relations in a flattened join problem exceeds these limits, the planner stops merging sub-problems and instead treats the remaining structure as a fixed join order derived from the query's syntactic shape. Beyond geqo_threshold (12), the genetic algorithm takes over entirely.

Lepikhov's observation exposes an unintended interaction between these limits and PostgreSQL's ever-expanding repertoire of subquery pull-up transformations. Over recent releases the planner has become more aggressive about flattening:

  1. ANY/EXISTS SubLinks → semi/anti joins (pull_up_sublinks)
  2. Simple subqueries in FROM → flattened range table entries (pull_up_simple_subquery)
  3. Correlated subplans → joins (recent work cited in [1])
  4. Trivial InitPlans → joins ([2], [3])

Each such transformation adds a base relation to the top-level jointree. When the resulting relation count crosses join_collapse_limit, the planner abandons cost-based join ordering for the entire tail of the jointree. What was previously a cost-optimized plan — even if the subquery was executed as a subplan — can regress catastrophically because the newly promoted relations get mechanically ordered by syntactic position rather than selectivity and join-clause availability.

This is an architectural tension: pull-up is almost always locally beneficial (it exposes join reordering, predicate pushdown, and better cardinality estimation to the optimizer), but globally it can push the problem size past the threshold where the optimizer can actually use those newly-exposed degrees of freedom.

Lepikhov's Refined Diagnosis (May 2026)

The thread's most important technical insight arrives three months after the initial post. Lepikhov revises his framing:

If a SubLink in the WHERE clause is not turned into a JOIN, it acts as a filter at the lowest possible level. When we do transform it, we move it to the top of the join tree.

This is the crux. pull_up_sublinks_qual_recurse converts a WHERE-clause SubLink into a JoinExpr attached at the top of the jointree, effectively making the semi-join the outermost operation. Before pull-up, the SubLink was evaluated as a filter qual on whatever rel its outer references touched — i.e., as deep in the plan as possible. After pull-up, its filtering power is structurally relocated to the outside of the collapse problem, and if the collapse limit is saturated, the planner can no longer reorder it inward.

The proposed remedy is surgical: during pull_up_sublinks_qual_recurse, compute the minimal set of relids the SubLink references (already available in the code), verify none lie on the nullable side of an outer join (a correctness prerequisite for pushing a semi-join down), and attach the new JoinExpr at the narrowest jointree node that covers those relids rather than the top. This preserves the filter's locality. Additionally, the transformation could be skipped entirely when join_collapse_limit is high enough that the topological placement would be corrected naturally — a cheap early-out.

The Design Debate

Tom Lane's Position: Raise the Defaults

Lane (core committer, de facto planner maintainer) pushes back on building specialized machinery. His argument is empirical rather than theoretical: join_collapse_limit = 8 was chosen in ~2004 hardware. Two decades of Moore's Law and planner micro-optimizations may mean the community is leaving performance on the table by being overly conservative with the default. He also calls out geqo_threshold for similar re-examination.

This is a characteristically Lane-style response: prefer tuning a well-understood knob over introducing new mechanism.

Robert Haas's Counter: Growth is Exponential, Not Linear

Haas (core committer) provides the deepest pushback on Lane's suggestion. His points are worth unpacking:

  1. Exponential planning cost means faster hardware yields only logarithmic headroom in the tractable join count. Moving from 8 to 10 or 12 is not a free lunch.
  2. New planner techniques multiply path counts. Partition-wise joins, partition-wise aggregates, parallel path generation, incremental sort, and memoize all inflate the per-relation path space, so the effective cost of N joins today is larger than in 2004 even ignoring pull-up pressure.
  3. Degenerate cases dominate the risk. He cites the pathological shape A LEFT JOIN (B1 ⋈ B2 ⋈ B3 …) LEFT JOIN (C1 ⋈ C2 ⋈ C3 …) …, where the outer-join structure severely constrains legal join orders. GEQO's random orderings are mostly invalid in such shapes, and the join-clause-preferring heuristic inside the DP search produces wildly varying workloads for the same N.
  4. The responsible path forward is to characterize the pathological cases first, mitigate them, then raise defaults — or accept a calculated risk and trust users to lower the knob.

Haas's stance effectively vetoes a naïve default bump without accompanying analysis.

Lepikhov's Pragmatic Middle Ground

Lepikhov (the proposer, an experienced planner contributor working at a PostgreSQL vendor) argues that defaults don't actually matter much because "people never use the default settings now." His real concern is migration regressions: users moving from commercial DBMSes encounter inexplicable slowdowns and need tools to diagnose and work around them. He proposes two lightweight deliverables:

  1. Statistics exposure: counts of flattened sublinks and promoted relations, consumable by extensions via planner_hook to implement replanning strategies.
  2. Ordering discipline: append pulled-up relations to the end of the join list to minimize cost-model disturbance on upgrade, giving users stable behavior across versions.

The May follow-up supersedes both with the sharper structural fix: place the new JoinExpr at the minimal covering jointree position.

Implications for pull_up_sublinks

The concrete code locus is pull_up_sublinks_qual_recurse in src/backend/optimizer/prep/prepjointree.c. Currently this function, when it succeeds in converting a SubLink, returns a JoinExpr that the caller splices into the jointree at whatever level the recursion had reached. The top-level caller in pull_up_sublinks_jointree_recurse tends to consolidate these into the outermost FROM list.

Lepikhov's proposed change would require:

This is analogous to, but distinct from, the outer-join identity-3 pushdown work that has been active in recent releases. The nullable-side check is the same correctness condition that governs qual pushdown in distribute_qual_to_rels.

Why This Matters Beyond the Immediate Case

The thread points to a broader planner design question: as PostgreSQL accumulates more pre-planner transformations (pull-up, qual simplification, constant folding, join removal, partition pruning), the jointree handed to make_one_rel drifts further from the user's written query. Each transformation is justified in isolation, but their interaction with the DP search bounds is not systematically managed. Lepikhov also alludes to partition pruning's state-machine limitations as another symptom of the same disease — planner quality being non-monotonic in transformation aggressiveness.

A principled fix might eventually require the planner to budget its search space: decide which transformations to apply based on the resulting problem size, rather than always applying them and hoping the collapse limit absorbs the consequences. No one in the thread proposes this directly, but it is the logical extrapolation of Haas's "characterize the pathological cases" program.

Status

As of the last message (May 2026), the discussion has narrowed from a broad "statistics + replanning infrastructure" proposal to a specific, targeted change in pull_up_sublinks_qual_recurse. No patch has been posted; only demonstration cases. Lane and Haas have expressed reservations about specialized mechanisms but have not engaged with the refined structural proposal. The thread appears to be awaiting a concrete patch.