Wrong Results in remove_useless_groupby_columns() — Collation and Opfamily Mismatches
Core Problem
remove_useless_groupby_columns() is a planner optimization in src/backend/optimizer/plan/planner.c that exploits functional dependencies proven by unique indexes: if a base relation has a NOT NULL unique index on columns (c1, ..., ck) and all those columns appear in the GROUP BY list, then any other GROUP BY columns from the same relation are functionally dependent on the unique key and can be dropped. Removing columns from GROUP BY shrinks the sort/hash key and can enable additional plan shapes.
The bug Richard Guo identifies is that the matching between index key columns and GROUP BY columns is done by attno alone. This ignores two crucial semantic dimensions of equality in PostgreSQL:
- Collation: For collatable types (text/varchar/etc.), "uniqueness" is collation-dependent when the collation is nondeterministic. Under ICU's
und-u-ks-level2(case-insensitive),'foo'and'FOO'are equal; under"C"they are distinct. A unique index built under"C"proves no uniqueness for the column's own collationci. - Operator family (opfamily): The equality semantics of a unique index are those of its index opclass's opfamily. GROUP BY, in contrast, uses the
SortGroupClause.eqop, which is normally the type's default equality (via the default btree/hash opfamily). When these differ — the canonical example beingrecord_image_ops(bytewise) vs. the defaultrecord_ops(logical per-field equality) — the index's notion of distinctness is strictly finer than GROUP BY's, so rows the index considers distinct may collapse into one group.
Why this produces wrong results
Both reproducers exploit the same pattern:
- Two rows are distinct under the unique index's equality, so the index is legal.
- The two rows are equal under GROUP BY's equality, so they form a single group.
- The planner, trusting the index, declares the extra GROUP BY column
bfunctionally determined byaand drops it. - At execution, the two rows merge into one group, but
b— which took different values in those rows — is no longer aggregated/tracked, so one row'sbvalue is silently lost.
This is a correctness bug, not merely a missed optimization: the query returns fewer rows than required by the grouping semantics.
Architectural Significance
This is part of a broader class of bugs Richard has been mining (he opens with "I seem to have fallen into a rabbit-hole around equality-relation issues"), where planner code written in an era of deterministic collations and implicit single-opfamily-per-type assumptions has not been systematically audited for:
- Nondeterministic collations (added in PG12)
- Multiple opfamilies per type (especially
record_image_ops, array equality variants, JSON equality etc.)
Any planner proof based on "column X uniquely determines column Y" must be checked against the specific equality operator that the consuming clause (GROUP BY, DISTINCT, join, semijoin) will use. Unique indexes, EquivalenceClasses, and functional-dependency reasoning all need opfamily + collation qualifiers.
Similar bugs have been found in:
innerrel_is_unique()/ unique-join detection- DISTINCT removal
- Self-join elimination
- EquivalenceClass construction with nondeterministic collations
Proposed Fix
Richard's patch (attached but not quoted in the email) conceptually must extend the matching loop inside remove_useless_groupby_columns() so that an index column i matches a GROUP BY item only when:
index->indexkeys[i] == var->varattno(existing check), and- The index column's collation equals the Var's collation (or the column is non-collatable), and
- The index column's opfamily contains the
SortGroupClause.eqop— i.e., the GROUP BY equality operator is a member of the same opfamily that the unique index relies upon for its uniqueness guarantee.
Condition (3) is the standard PostgreSQL idiom for "these two equality notions agree," used in many places (e.g., op_in_opfamily, get_mergejoin_opfamilies). Condition (2) mirrors what was done elsewhere for nondeterministic collations: compare indcollation[i] against the Var's varcollid.
Key Tradeoffs and Design Considerations
- Conservative vs. precise: The simplest fix is to bail out whenever collation/opfamily doesn't match, losing the optimization for those cases. A more aggressive fix could still use the index when the GROUP BY equality is known to be a refinement of the index equality — but in general that is not safe (and for the wrong-results case here, the relationship is the opposite: index equality is finer than GROUP BY equality).
- Backpatching: Because this is a wrong-results bug (data loss in query output), it is a backpatch candidate to all supported branches, subject to whether the fix is minimal enough. The nondeterministic-collation aspect only applies from PG12; the opfamily aspect (with
record_image_ops) is older. - Test coverage: The two reproducers Richard provides are excellent regression test material and target the two independent failure axes.
Technical Insights
- The
SortGroupClause.eqopfield is the authoritative source of GROUP BY equality semantics and must drive any functional-dependency reasoning — attno alone is never sufficient in a system with pluggable opfamilies. record_image_opsis a recurring trap because it is the canonical example of a non-default opfamily on a commonly used type, and its bytewise equality is strictly finer than logical record equality (numericvalues with different display scales being the classic trigger).- Nondeterministic collations break a long-standing planner assumption that "same attno ⇒ same equality." Each site making that assumption must be audited.