Overview
Andrei Lepikhov proposes extending PostgreSQL's join planning to opportunistically generate pre-sorted outer paths for NestLoop joins when the outer relation's columns appear as a prefix of query_pathkeys (the ORDER BY / GROUP BY / DISTINCT ordering demand). The motivation is a common "top-N with details" pattern: a driving table joined via several LEFT JOINs to property/lookup tables, then ORDER BY driver.col LIMIT k. Today the planner frequently sorts after all the joins, even though the sort key is functionally determined by the outer side alone.
The Architectural Problem
PostgreSQL's match_unsorted_outer considers an outer path's existing pathkeys when building mergejoin candidates, and considers arbitrary outer orderings only when an index or other cheap sorted path already exists. For NestLoop, the outer ordering is preserved to the join output (since NestLoop emits outer tuples in outer order, with inner matches appended), so a sort on the outer side alone is sufficient to satisfy a top-level ORDER BY whose keys come only from the outer relation.
Two facts combine to make this valuable:
- Pathkey preservation through NestLoop + LEFT JOIN. A left outer NestLoop preserves outer ordering in the result, so
ORDER BY outer.colcan be satisfied below the join. The upper Sort becomes redundant. - Bounded-sort propagation under LIMIT. If a
Limitknows its tuple bound,ExecSetTupleBoundrecursively informs the underlying Sort, which switches to a top-N heapsort (bounded in memory and CPU). Today this propagation stops at joins because there is no clear semantics for "how many outer tuples do I need to produce N join rows." For LEFT OUTER joins, however, each outer tuple produces at least one output row, so the bound can be pushed into the outer subtree safely.
The payoff is large: instead of sorting the fully materialized join output (possibly spilling to disk), you sort only the outer relation with a bounded heap of size N, then do N index/hash lookups into the inner sides.
Why an index isn't the answer
A reviewer might naturally respond "just build an index on popularity." Lepikhov preempts this:
- It may be wasteful when the sort column is low-value or frequently updated.
- It is impossible when the outer is not a plain table —
FunctionScan,SubqueryScan, CTE, or, critically, anAppendover foreign tables (partitioned FDW setups), where only an explicit Sort node can establish the ordering.
This last case is the strongest architectural justification: without this patch, partitioned/sharded foreign-table configurations cannot exploit the top-N-with-details pattern at all.
Patch Structure
The posted sketch has three parts:
1. Executor: extending ExecSetTupleBound
The existing bound-propagation machinery stops at join nodes. The patch teaches it to recurse into the outer child of a join when that is semantically sound (outer joins, or inner joins where each outer row produces ≥1 row — though the initial patch appears focused on the OUTER JOIN case). This is described as necessary because there is "no proper hook in this subsystem," i.e., extensions cannot plug into bound propagation; the change must live in core.
2. Planner: match_unsorted_outer modification
When building NestLoop paths, if (a) the outer relation's Vars form a prefix of query_pathkeys and (b) no existing sorted path on that prefix is present, synthesize an explicit Sort atop the outer path and generate a NestLoop that inherits those pathkeys. This lets upper-level path generation (grouping, DISTINCT, final ORDER BY) see the join as already sorted and skip the redundant top-level Sort.
3. Regression test adjustments
Plan shapes change in a number of existing regression queries — used by Lepikhov as evidence that the optimization enables better MergeJoin and aggregation plans beyond the motivating top-N case.
Cost Model Concern
Lepikhov flags a honesty issue in the cost model: when costing the outer Sort, he uses the LIMIT tuple bound (top-N heapsort cost), even though at plan time the bound is only propagated through OUTER join semantics. This makes the pre-sorted path look artificially cheap versus alternatives. He dismisses it as low-risk because it concerns in-memory sorts, but this is exactly the kind of local shortcut that can cause systemic plan regressions — and the May follow-up vindicates that concern.
The May 2026 Regression: Planning-Time Blowup
Five weeks later Lepikhov reports a serious real-world regression. On a large query:
| Presorted enabled | Disabled | |
|---|---|---|
| Planning time | 3194.9 ms | 1328.9 ms |
| Execution time | 17.7 ms | 11.8 ms |
| Planner memory | 136930 kB | 136225 kB |
Planning time more than doubled, execution got worse, and memory grew. The culprit is relation_can_be_sorted_early, which the patch calls to check whether a given relation's target can plausibly produce the needed sort prefix. Its cost profile:
- Walks
rel->reltarget->exprs. - For each expression, calls
find_ec_member_matching_expr, which linearly scans the equivalence class'sec_members. - Falls through to
find_computable_ec_member, which walks the EC again using parser-level machinery to detect whether the key is computable from the relation.
For queries with CASE expressions as sort keys spanning multiple tables, no ec_member exactly matches any reltarget entry, so every call runs the full slow path and returns false. The patch invokes this for many (join-rel, pathkey-prefix) combinations, compounding quadratically with query size.
Proposed fix: cache useful pathkeys per RelOptInfo
Lepikhov's remediation is architectural rather than micro-optimizing: compute once, at RelOptInfo creation, which query_pathkeys prefixes this relation could satisfy early — a useful_query_pathkeys cache analogous to the existing useful_pathkeys logic for indexes (build_index_pathkeys, truncate_useless_pathkeys). Then match_unsorted_outer consults the cache in O(1) instead of re-deriving it for every join pair.
This is the right shape of fix: the EC membership graph is stable across the join search, so per-relation memoization is natural. The parallel with how index pathkeys are pre-computed is strong precedent.
Key Insights and Implications
- Pathkey-preserving joins plus bounded-sort propagation under OUTER JOINs is an underexploited combination. The correctness argument (every outer row → ≥1 output row under LEFT JOIN) is a property the planner already knows; the patch simply teaches the executor and the join-path generator to exploit it jointly.
- The feature's strongest case is foreign/partitioned outers, where no index is available and only an explicit Sort establishes ordering. This elevates the patch from "nice micro-optimization" to "enables a plan shape previously unreachable."
- The cost-model shortcut (assume LIMIT reaches the Sort) is theoretically dangerous, though practically acceptable for in-memory bounded sorts. If a reviewer pushes back here, a cleaner design would cost the unbounded Sort and track the bound-propagation as a separate path property.
- The planning-time regression is not an implementation detail — it is a design-level signal.
match_unsorted_outerruns inside the join search, which is combinatorial. Any per-call work that touches ECs must be either O(1) amortized or hoisted. This is a recurring lesson in planner extensions.
Prior Art and Relationship
The patch explicitly builds on two prior threads:
- 2018 "push down SORT and LIMIT nodes" (Gregory Stark-era discussion) — tackled similar territory but more narrowly.
- 2025 "Wasteful nested loop join when there is
limitin the query" — a direct predecessor problem report.
Lepikhov positions this patch as the general solution rather than a point fix for one plan shape.
Status
As of the thread's latest message, the patch is in rework: Lepikhov has acknowledged the planning-time regression, identified its cause in repeated relation_can_be_sorted_early calls, and proposed caching as the structural fix. No committer feedback appears in the provided messages; the thread is effectively a self-authored design evolution at this point.