Analysis: Hash-Accelerating MCV Tracking in compute_distinct_stats()
The Core Problem
PostgreSQL's ANALYZE machinery in src/backend/commands/analyze.c and src/backend/utils/misc/ picks one of two "compute stats" functions per attribute based on the type's operator class support:
compute_scalar_stats()— used when the type has an ordering operator (<). This is the common path: most built-in scalar types (int, text, numeric, timestamp, etc.) go through here. It sorts the sample and derives MCVs plus a histogram from the sorted run-lengths.compute_distinct_stats()— used for equality-only types (types with=but no<). Because it cannot sort, it maintains a boundedtrack[]array of candidate MCVs, and for each sampled row it linearly scans the array looking for a match.
With statistics_target = N, track[] can grow to O(N) entries, and the inner match loop becomes O(N) per sampled row. Since the sample size itself scales with the statistics target (~300·N rows), the total work is O(N²). Chengpeng's benchmarks show this is not theoretical: at statistics_target = 10000, ANALYZE on an xid/aclitem column takes ~30–200 seconds, vs. sub-second with the patch.
Architecturally, this is the same class of quadratic-MCV-scan issue the project has addressed before (e.g., commit 057012b205a EQJOINSEL_MCV_HASH_THRESHOLD in selfuncs.c), and the fix pattern is identical: maintain a side hash table to turn O(n) equality probes into O(1).
The Proposed Solution
The v1 patch makes two intertwined changes to compute_distinct_stats():
1. Hash-accelerated MCV lookup
When attstattarget >= 100 and the type's default hash opclass is compatible with the equality operator used for MCV tracking, a simplehash.h-based table is built mapping Datum → track[] slot. The lookup path becomes O(1) amortized; the fallback linear scan is retained for types without hash support or for low stat targets.
The hash/equality compatibility check is non-trivial: you must verify that the equality operator the caller gave (mystats->eqopr) matches the semantics of the type's default hash opclass. Otherwise you could hash-miss pairs that the equality operator would consider equal (e.g., numeric where 1.0 = 1.00 but the binary representations hash differently — fortunately numeric has < and goes through the scalar path).
2. Cursor-based singleton eviction
The original code, when a new distinct value is seen and track[] is full, evicted the oldest singleton (count=1) from the head of the singleton region by shifting the tail — O(n) per new distinct value. The patch replaces this with a circular cursor (c1_cursor) over the count==1 region, making eviction O(1).
This second change is motivated by the hash path: with a hash table mapping Datum → slot index, a shift of the singleton subarray would require updating every affected hash entry's index, negating the benefit of O(1) lookup. The cursor avoids that.
The Subtle Correctness Issue: FIFO Equivalence
The heart of the review discussion revolves around preserving the original's FIFO eviction order even without physical shifts. The tricky interaction is between:
firstcount1— boundary index where the count==1 region beginsc1_cursor— the "clock hand" pointing to the next singleton to evict- The bubble-up step that runs when a tracked value is matched again: if a count=1 entry is matched (becomes count=2), it must move left past all other count=1 entries into the count>1 prefix. This physically shifts
track[firstcount1..match_index-1]right by one slot.
If c1_cursor points inside that shifted range, the element it "tracks" has moved; failing to advance the cursor would cause the next eviction to kill a different (newer) singleton, breaking FIFO equivalence with the original code.
Chengpeng's worked example is the definitive explanation: singleton region [A,B,C,D], cursor at B; D is matched again and bubbles up past A,B,C. Without adjusting the cursor, the slot the cursor points to now holds A, and A would be evicted next instead of B.
Two variants were debated:
- Loop form (v1/v2):
while (firstcount1 < track_cnt && track[firstcount1].count > 1) firstcount1++;— robust "invariant repair"; re-advancesfirstcount1until it again points to the first true singleton. - Single-step form (Ilia/Tatsuya): a single conditional
firstcount1++when a singleton was promoted, plus clampingc1_cursor.
Tatsuya (newcomer) caught a real bug in Ilia's first sketch (firstcount1-- should be ++, and the condition needs <= not <) — a nice instance of careful review catching direction-of-boundary errors. Chengpeng ultimately provided both forms split across patches so reviewers could compare, and also explained why the condition is <= match_index: the cursor can legitimately point at the promoted slot itself.
Why It Was Withdrawn: Scope Problem
The decisive technical objection came from John Naylor (committer), repeated across two messages: compute_distinct_stats() is a rarely-taken path. In practice the built-in types routed here are essentially xid, cid, and aclitem, plus user-defined equality-only types. Almost all real columns ANALYZE touches have < and go through compute_scalar_stats() — which is not fixed by this patch.
John's framing: "in any installation, the common path is going to vastly outnumber the rare path, so this patch is optimizing the wrong thing." This is an architectural judgment about where engineering attention should go, not a correctness criticism. A similar O(n²) issue likely exists (or existed) in compute_scalar_stats()'s own MCV-collection logic, and fixing that would dwarf the benefit here.
Chengpeng accepted the point and withdrew the patch from the CommitFest, leaving the door open if the ecosystem gains more widely used equality-only types (conceivable for some extension-defined domain types, network-ish types without a natural order, etc.).
Secondary Design Discussions
Threshold heuristic. Chengpeng initially gated hashing on attstattarget >= 100, modeled on EQJOINSEL_MCV_HASH_THRESHOLD. John pushed back: stat targets below 100 are essentially unheard of, so special-casing is unjustified code. Ilia separately noted the eqjoinsel threshold is a weak reference because the cost curves differ (datum width matters here: wider types like bytea have more expensive equality, shifting break-even). Chengpeng defended the threshold by pointing to vacuumdb --stage which temporarily sets default_statistics_target to 1 and 10 — a legitimate corner case but not a strong one.
Memory cleanup. The hash table is palloc'd in the per-column "Analyze Column" memory context that is reset between columns. No explicit destroy is required; this matches the existing compute_distinct_stats() comment ("We don't need to bother cleaning up any of our temporary palloc's"). Ilia initially flagged a missing destroy call; Chengpeng's explanation is correct and consistent with existing conventions.
Patch splitting. Ilia's request to split the cursor refactor from the hash addition is methodologically sound: the cursor change is a semantics-preserving algorithmic refactor that must be proven equivalent to the shift-based original, while hash lookup is a pure lookup optimization. Reviewing them together conflates two very different kinds of risk.
Architectural Takeaways
- Whenever an MCV tracking structure grows with
statistics_target, check for O(n²) scans. The project has paid this tax multiple times. - When adding a side index (hash table) over an array, any reordering of the array becomes expensive unless you redesign the update pattern (here: cursor-over-FIFO instead of physical shift).
- "Correct but rarely used" optimizations face a high bar: committers explicitly weigh expected aggregate benefit across installations vs. code-maintenance cost. This is why Naylor's objection carries the thread.
- The likely next frontier — if someone wants to generalize this work — is auditing
compute_scalar_stats()'s own MCV-collection loop for similar behavior on the common path.