Set calcSumX2 = true in numeric_(poly_)deserialize

First seen: 2026-05-14 17:38:44+00:00 · Messages: 1 · Participants: 1

Latest Update

2026-05-18 · claude-opus-4-6

Technical Analysis: Set calcSumX2 = true in numeric_(poly_)deserialize

Core Problem

This thread addresses a subtle but semantically important bug in PostgreSQL's aggregate deserialization functions for variance/standard deviation aggregates. The issue lies at the intersection of the aggregate state machine lifecycle and how external consumers (extensions) may interact with deserialized aggregate state.

Background: PostgreSQL's Aggregate Combine/Serialize/Deserialize Pipeline

PostgreSQL's parallel aggregation framework splits aggregate computation into phases:

  1. Transition (transfn): accumulates values into internal state
  2. Serialize (serialfn): converts internal state to bytea for inter-process transfer
  3. Deserialize (deserialfn): reconstructs internal state from bytea
  4. Combine (combinefn): merges two partial states

For variance-family aggregates (var_pop, var_samp, stddev_pop, stddev_samp, regr_*), the internal state tracks three quantities:

The calcSumX2 flag in the aggregate state structure controls whether do_numeric_accum() and do_int128_accum() will accumulate the squared-value sum. This flag exists because simpler aggregates (like avg) share the same state structure but don't need sumX2.

The Bug

Both numeric_poly_deserialize() and numeric_deserialize() construct their output state with calcSumX2 = false:

/* numeric_poly_deserialize */
result = makeInt128AggStateCurrentContext(false);
/* numeric_deserialize */
result = makeNumericAggStateCurrentContext(false);

This is semantically incorrect for variance/stddev aggregates. Within PostgreSQL's internal parallel aggregation pipeline, the bug is latent because:

However, for external consumers — particularly extensions that deserialize aggregate state and then feed additional values through the transition function — the bug is active: do_numeric_accum() / do_int128_accum() check state->calcSumX2 before accumulating squared values, so deserialized state will silently stop tracking sumX2 while N and sumX continue updating correctly. This produces incorrect variance/stddev results.

Architectural Significance

This is an example of a broader class of issues in PostgreSQL: functions that are "correct" only within the specific call graph the core executor uses, but violate semantic contracts that extensions reasonably rely on. The serialize/deserialize API is public and documented — extensions should be able to round-trip aggregate state through external storage and resume accumulation.

The fix also relates to the principle of least surprise: if numeric_combine already sets calcSumX2 = true (acknowledging the semantic meaning), the deserialize functions should be consistent.

Proposed Fix

The patch is minimal — a two-line change:

/* numeric_poly_deserialize: change false → true */
result = makeInt128AggStateCurrentContext(true);
/* numeric_deserialize: change false → true */
result = makeNumericAggStateCurrentContext(true);

Risk Assessment

Design Considerations

  1. Why was it false originally? Likely an oversight during the original implementation of parallel aggregation (circa PostgreSQL 9.6). The developer may have used false as a default without considering that the deserialized state's flag should match the aggregate's semantic requirements.

  2. Should the serialize format encode the flag? A more robust approach might serialize the calcSumX2 flag itself, but this would require a format change and is overkill given that the deserialize functions are already specific to variance-family aggregates (they're only registered as deserializers for those aggregates).

  3. Could this break anything? Setting the flag to true in deserialized state means that if someone were to call the transition function on deserialized state, it would now attempt to accumulate sumX2. Since the deserialized state already contains a valid sumX2 value (it was serialized from a complete state), this is correct behavior.

Open Questions