2026-05-06 · opus 4.7

GOO (Greedy Operator Ordering) as a Potential GEQO Replacement

The Core Problem

PostgreSQL's join-order search has two regimes: exhaustive dynamic programming (DP) in make_rel_from_joinlist / standard_join_search, and GEQO (Genetic Query Optimizer) which kicks in once the number of baserels exceeds geqo_threshold (default 12). Both regimes have well-known limitations:

DP scales combinatorially. For N relations it enumerates O(3^N) join subsets in the worst case (though cross-product pruning helps). Beyond ~15 relations, memory and planning time become prohibitive.
GEQO is a randomized genetic search. It is stochastic (non-deterministic across runs unless geqo_seed is fixed), has significant constant-factor overhead (population initialization, mutation, multiple full path builds per individual), and in practice its planning cost often exceeds DP for medium-sized joins. Crucially, Tomas Vondra's measurements on TPC-DS show GEQO being slower than DP even at geqo_threshold=2 — this is counterintuitive for what is supposed to be a "cheap" heuristic.

This thread proposes a third option: GOO (Greedy Operator Ordering) based on Fegaras's 1998 DEXA paper. GOO's algorithm is simple:

Enumerate all currently-legal join pairs among remaining relations.
Rank them by some greedy key (cost, output rows, result bytes, selectivity).
Merge the cheapest/smallest pair into a single "super-relation."
Repeat until one relation remains.

Each iteration is O(N²) in the worst case; total complexity is roughly O(N³), far cheaper than DP or GEQO's configured generations × population.

Architectural Significance

GOO is not merely a performance tweak. It probes a structural question in the planner: how much of the join-ordering problem can be solved by a deterministic, single-pass contraction versus full DP enumeration? The answer matters because:

Large analytical joins (star/snowflake, JOB-Complex, data-vault schemas with 20+ tables) are increasingly common.
GEQO's non-determinism is painful for plan stability, EXPLAIN comparison, and tools like pg_plan_advice.
A deterministic greedy could serve as a seed for DP, GEQO, or for the hybrid "effort-budgeted" planner that the submitter proposes at the end of the thread.

The Patch's Evolution

The patch series went through several iterations, and the trajectory is itself informative:

v1: Raw GOO using cheapest_total_path->total_cost as the greedy key. Crashes on any TPC-DS query containing aggregates because the greedy loop never calls set_cheapest() along paths required by eager aggregation — sort_inner_and_outer() ends up dereferencing inner_path = NULL.
v2: Fixes the eager-aggregation interaction; all 99 TPC-DS queries plan successfully.
v3: Adds a goo_greedy_strategy GUC for experimental evaluation of four keys: cardinality, selectivity, result size in bytes, and total cost.
v4: Introduces the combined strategy — run GOO twice (once by cost, once by result_size), then pick the candidate plan with the lowest final estimated total_cost. This is the key robustness idea: use the greedy algorithm as a candidate generator, with cost as the selector.
v5: Splits into 0001 (core GOO, cost-only) and 0002 (strategy layer including the new selectivity heuristic). Adds tie-breaking determinism.

Key Technical Findings

1. Greedy keys have qualitatively different failure modes

On TPC-H SF=1, the submitter identified two distinct pathological patterns:

Q20 (output-oriented failure): A partsupp ⋈ aggregated-lineitem join is estimated at tens of rows but actually produces hundreds of thousands. Output-oriented keys (rows, result_size, selectivity) aggressively prefer this join because it looks shrinking. The misestimate is amplified into catastrophic downstream cost.
Q7 (cost-oriented failure): A locally cheap join is selected early, producing a many-to-many intermediate that cannot be filtered until fact-table joins arrive later. This is inherent to greediness, not a bug in any particular key.

Tomas's response is sharp and correct: Q20 is "garbage in, garbage out" and cannot be solved at the search level. Q7 is the textbook limitation of greedy algorithms and is also not solvable by picking a different scalar key.

2. Selectivity is scale-free and therefore brittle

The v5 JOB-Complex run produced the most damning evidence: GOO(selectivity) causes 6 timeouts and is 6000× slower than GEQO on JOB 24b and 29a. The submitter's diagnosis is precise — selectivity measures relative reduction (join_size / (left × right)) and discards absolute magnitude. A highly selective join between two huge inputs still produces a huge intermediate. This rules out selectivity as a primary key but still leaves it useful as one of several candidates.

3. Plan-diversity via multiple greedy runs is more robust than tuning one key

GOO(combined) on JOB: gmean 0.953 (beats DP on geometric mean), no regressions ≥10×, max 8.68×. Compare GOO(cost) alone: max 431×, 15 queries ≥5×. The lesson: since different greedy keys fail for uncorrelated reasons, running two or three and picking the cheapest final plan converts the individual worst cases into a near-best-of ensemble at modest extra cost.

4. The selector is itself a weakness

In JOB-Complex q28, q12, and 15a, a better GOO candidate existed (result_size) but the total_cost-based final selector chose the worse one. This exposes that any final selection based on the same cost model that failed to predict the good plan's quality is self-limiting. The CIDR 2021 "Simplicity Done Right" paper (Hertzschuch et al.), flagged by Tomas, proposes upper-bound (pessimistic / worst-case) estimates precisely for this reason — they're more resilient to the common nested-loop catastrophe where a very optimistic cardinality wins a planner contest and then the plan never completes.

5. Planning-time results are compelling but contextual

On JOB-Complex the v5 run shows GOO(combined) is ~5.2× faster to plan than GEQO. Lakshmi's synthetic 20-table chain joins show GOO at ~1.5ms vs DP ~48ms and GEQO ~51ms. However, Tomas correctly punctures these synthetic numbers: with every join on the same id column, there's one giant equivalence class, no order is actually forced, and the join_collapse_limit=8 default reduces DP's real problem size. Synthetic linear chains without meaningful selectivity variation tell us almost nothing about plan quality.

Key Disagreements and Framing

The most important framing intervention comes from Tomas and later John Naylor: GOO should not be evaluated against DP. DP is the gold standard when it's feasible; the relevant comparison is GEQO. The submitter gradually internalizes this — the v4 analysis separates the "GEQO range" (≥12 relations) subset of JOB, and v5 focuses entirely on JOB + JOB-Complex. John reinforces this in the final message: reporting small-join regressions is a distraction since those cases wouldn't trigger GEQO anyway.

A secondary disagreement concerns what "good enough" means. The submitter's v5 JOB-Complex numbers show GOO(combined) beats GEQO on execution-time sum (292s vs 430s, 1.47× faster) but has 4 queries ≥10× slower than GEQO. The submitter concludes this is not sufficient to replace GEQO as default. This is a mature judgment: a heuristic with severe tail risk cannot be the default, regardless of average wins.

The Pivot to Effort-Budgeted Hybrid Planning

The final message (2026-05-03) marks a direction change. Rather than pushing pure-GOO further, the submitter proposes a fixed-effort planner that runs exact DP while a budget allows, then completes the remaining problem with GOO contraction. This echoes prior community discussion (linked CAFBsxsE3Sb...) about smoother degradation instead of the current hard cliff at geqo_threshold. This direction is architecturally more promising because:

It preserves DP's optimality for the portion of the search space that fits the budget.
Greedy contraction only commits where DP could not afford to enumerate.
Tuning reduces to a single knob (effort/budget) rather than picking among opaque strategies.
It composes with future work on upper-bound estimates (CIDR 2021 U-block).

Interaction with Other Planner Machinery

Several subtle integration points surfaced:

Eager aggregation (the v1 crash): the greedy loop must still produce valid RelOptInfos with populated pathlists. GOO cannot skip set_cheapest() because downstream machinery (grouping path generation, sort_inner_and_outer) assumes cheapest_total_path and related fields are non-NULL.
Parallelism: All benchmarks used max_parallel_workers_per_gather=0. Parallel path generation interacts with join order (partial paths, gather placement) and has not been evaluated.
pg_plan_advice integration: John Naylor highlights this as a strong acceptance criterion. A deterministic greedy is easier to steer with hints than a genetic search — this is a non-obvious but important argument for GOO over GEQO independent of raw performance.
join_collapse_limit / from_collapse_limit: Tomas caught that Lakshmi's synthetic test used defaults (8), meaning DP was only asked to solve an 8-way problem regardless of the 20 tables in the query text. This is a recurring testing pitfall.

Assessment

The thread is in a healthy state but not close to commit. The core GOO implementation appears stable (v5 passes make check-world), but:

The value proposition as a direct GEQO replacement is unproven given the tail regressions.
The more promising direction (effort-budgeted DP+greedy hybrid) is only prototyped.
No committer has signaled intent to pick up the patch. Tomas is engaged as a reviewer and has domain expertise (he authored the star-join thread that inspired this) but hasn't endorsed commit.
Integration questions (plan advice, parallelism, deterministic tie-breaking across platforms) are only partially addressed.

The intellectual contribution — the systematic evaluation of greedy keys, the identification of selectivity's scale-freeness, the diagnosis that the final selector is itself a weakness, and the pivot toward hybrid DP/greedy — is substantial regardless of whether this specific patch lands.

Add a greedy join search algorithm to handle large join problems

Latest Update