Wrong results in remove_useless_groupby_columns()

First seen: 2026-05-05 10:12:16+00:00 · Messages: 1 · Participants: 1

Latest Update

2026-05-06 · opus 4.7

Wrong Results in remove_useless_groupby_columns() — Collation and Opfamily Mismatches

Core Problem

remove_useless_groupby_columns() is a planner optimization in src/backend/optimizer/plan/planner.c that exploits functional dependencies proven by unique indexes: if a base relation has a NOT NULL unique index on columns (c1, ..., ck) and all those columns appear in the GROUP BY list, then any other GROUP BY columns from the same relation are functionally dependent on the unique key and can be dropped. Removing columns from GROUP BY shrinks the sort/hash key and can enable additional plan shapes.

The bug Richard Guo identifies is that the matching between index key columns and GROUP BY columns is done by attno alone. This ignores two crucial semantic dimensions of equality in PostgreSQL:

  1. Collation: For collatable types (text/varchar/etc.), "uniqueness" is collation-dependent when the collation is nondeterministic. Under ICU's und-u-ks-level2 (case-insensitive), 'foo' and 'FOO' are equal; under "C" they are distinct. A unique index built under "C" proves no uniqueness for the column's own collation ci.
  2. Operator family (opfamily): The equality semantics of a unique index are those of its index opclass's opfamily. GROUP BY, in contrast, uses the SortGroupClause.eqop, which is normally the type's default equality (via the default btree/hash opfamily). When these differ — the canonical example being record_image_ops (bytewise) vs. the default record_ops (logical per-field equality) — the index's notion of distinctness is strictly finer than GROUP BY's, so rows the index considers distinct may collapse into one group.

Why this produces wrong results

Both reproducers exploit the same pattern:

This is a correctness bug, not merely a missed optimization: the query returns fewer rows than required by the grouping semantics.

Architectural Significance

This is part of a broader class of bugs Richard has been mining (he opens with "I seem to have fallen into a rabbit-hole around equality-relation issues"), where planner code written in an era of deterministic collations and implicit single-opfamily-per-type assumptions has not been systematically audited for:

Any planner proof based on "column X uniquely determines column Y" must be checked against the specific equality operator that the consuming clause (GROUP BY, DISTINCT, join, semijoin) will use. Unique indexes, EquivalenceClasses, and functional-dependency reasoning all need opfamily + collation qualifiers.

Similar bugs have been found in:

Proposed Fix

Richard's patch (attached but not quoted in the email) conceptually must extend the matching loop inside remove_useless_groupby_columns() so that an index column i matches a GROUP BY item only when:

  1. index->indexkeys[i] == var->varattno (existing check), and
  2. The index column's collation equals the Var's collation (or the column is non-collatable), and
  3. The index column's opfamily contains the SortGroupClause.eqop — i.e., the GROUP BY equality operator is a member of the same opfamily that the unique index relies upon for its uniqueness guarantee.

Condition (3) is the standard PostgreSQL idiom for "these two equality notions agree," used in many places (e.g., op_in_opfamily, get_mergejoin_opfamilies). Condition (2) mirrors what was done elsewhere for nondeterministic collations: compare indcollation[i] against the Var's varcollid.

Key Tradeoffs and Design Considerations

Technical Insights