[PATCH] Fix overflow and underflow in regr_r2()

First seen: 2026-04-27 11:18:38+00:00 · Messages: 15 · Participants: 3

Latest Update

2026-06-04 · claude-opus-4-6

Incremental Update: Both Patches Committed

Dean Rasheed confirms both patches (regr_r2 and regr_intercept overflow/underflow fixes) have been pushed to HEAD on 2026-06-03. No further technical discussion — this is the final commit confirmation closing the thread.

No new technical content, position shifts, or participants. The thread is now complete.

History (2 prior analyses)
2026-06-01 · claude-opus-4-6

Monthly Summary: Fix Overflow and Underflow in regr_r2() (and regr_intercept())

Overview

This thread addresses numerical instability in PostgreSQL's regr_r2() and regr_intercept() aggregate functions, where intermediate floating-point products can overflow to infinity or underflow to zero even when the true mathematical result is perfectly representable. The motivating example: perfectly correlated data with values near 1e-100 returns NaN instead of 1.0 for regr_r2().

The work builds on a prior fix for corr() (commit 6498287696d in PostgreSQL 19) that addressed the same class of vulnerability.

Key Developments

regr_r2() — Stabilized Computation (v1 → v2)

Chengpeng Yan submitted v1 proposing a shared helper function extracted from corr(). Dean Rasheed refined this into v2 with significant improvements:

  • Rejected helper function abstraction — only a few lines of shared code; kept fix local to float8_regr_r2()
  • Removed premature Sxy == 0 fast-path — adds an unnecessary branch for an extremely rare case
  • Improved test cases — replaced 1e154 * g (platform/extra_float_digits sensitive) with 1e100 + g * 1e95 for reliable cross-platform reproduction

The stabilization mirrors corr(): when direct computation yields zero or infinity, fall back to ratio-based formulation that avoids raw products.

regr_intercept() — frexp/ldexp Decomposition

Tom Lane's audit identified regr_intercept() as vulnerable in the sub-expression Sx * Sxy / Sxx. Several approaches were debated:

  • Reordering parenthesization — Dean demonstrated counterexamples for both Sx * (Sxy / Sxx) and Sxy * (Sx / Sxx)
  • Log-space computation — Rejected due to extensive special-casing for zero/negative values

Agreed solution: Use C99 frexp()/ldexp() for manual extended-exponent arithmetic:

if (offset == 0 || isinf(offset)) {
    m_Sx  = frexp(Sx,  &n_Sx);
    m_Sxy = frexp(Sxy, &n_Sxy);
    m_Sxx = frexp(Sxx, &n_Sxx);
    m_offset = m_Sx * m_Sxy / m_Sxx;
    offset = ldexp(m_offset, n_Sx + n_Sxy - n_Sxx);
}

Mantissas are in [0.5, 1.0) so products/quotients cannot overflow; exponent arithmetic is in integer space. This is portable (C99 standard, already used in pg_prng.c), handles signs correctly, and requires minimal special-casing.

Backpatch Decision

Consensus: no backpatch. No prior user complaints, and consistency with the corr() fix argues for shipping together in v19. RMT concurrence was sought for inclusion in v19.

Current Status

The v2 patches for both regr_r2() and regr_intercept() are in review. Chengpeng Yan provided LGTM for regr_r2() and minor feedback on regr_intercept():

  • Patch format issue preventing clean git apply
  • Naming preference (offset over dy)
  • Question about test coverage for underflow path and Inf/NaN guards

The thread is awaiting Dean Rasheed's response to these review comments.

Architectural Significance

This thread establishes the frexp/ldexp technique as a general pattern for avoiding intermediate overflow/underflow in ratio computations within PostgreSQL's floating-point aggregates — simpler and more robust than log-space approaches, with automatic sign handling and minimal special-casing.


2026-06-01 · claude-opus-4-6

Incremental Update: Final Patch Refinements and Imminent Commit

What's New

Dean Rasheed responds to Chengpeng Yan's review comments and signals intent to commit both patches within days.

Key technical additions:

  1. New underflow test case for regr_intercept — Dean adds a concrete test demonstrating the real-world impact of the frexp/ldexp fix:

    SELECT regr_intercept(y, x) FROM (VALUES (-1e-131, 0), (2e-131, 3e-131)) v(x, y);
    
    • Without patch: Sx * Sxy / Sxx underflows to zero, so the function returns Sy / N = 3e-131 / 2 = 1.5e-131 — a 50% relative error.
    • With patch: The frexp/ldexp fallback correctly computes the offset, returning 1e-131 (the true intercept).

    This is a stronger motivating example than was previously discussed because it shows a silently wrong but plausible-looking result rather than NaN — making the bug harder for users to detect.

  2. Variable naming decision — Dean keeps dy over Chengpeng's suggestion of offset, reasoning it's a common notation for difference-in-y and is well-documented by the accompanying comment.

  3. Inf/NaN guard testing declined — Dean explains the guard is defensive programming against technically uninitialized variables in the frexp decomposition path, not a path that can be reached with real inputs from the accumulator (which only stores finite values). No test needed.

  4. Commit timeline — Both patches (regr_r2 and regr_intercept) will be pushed to HEAD only pending no RMT objections, expected within "a couple of days."

  5. Cross-platform validation — Chengpeng confirms all regression tests pass on Apple Silicon (ARM64), adding confidence to platform portability of the frexp/ldexp approach.

Assessment

The thread is effectively concluded. Both patches are in final form with reviewer sign-off. No outstanding technical disagreements remain.