2026-06-01 · claude-opus-4-6
Monthly Summary: Fix Overflow and Underflow in regr_r2() (and regr_intercept())
Overview
This thread addresses numerical instability in PostgreSQL's regr_r2() and regr_intercept() aggregate functions, where intermediate floating-point products can overflow to infinity or underflow to zero even when the true mathematical result is perfectly representable. The motivating example: perfectly correlated data with values near 1e-100 returns NaN instead of 1.0 for regr_r2().
The work builds on a prior fix for corr() (commit 6498287696d in PostgreSQL 19) that addressed the same class of vulnerability.
Key Developments
regr_r2() — Stabilized Computation (v1 → v2)
Chengpeng Yan submitted v1 proposing a shared helper function extracted from corr(). Dean Rasheed refined this into v2 with significant improvements:
- Rejected helper function abstraction — only a few lines of shared code; kept fix local to
float8_regr_r2()
- Removed premature
Sxy == 0 fast-path — adds an unnecessary branch for an extremely rare case
- Improved test cases — replaced
1e154 * g (platform/extra_float_digits sensitive) with 1e100 + g * 1e95 for reliable cross-platform reproduction
The stabilization mirrors corr(): when direct computation yields zero or infinity, fall back to ratio-based formulation that avoids raw products.
regr_intercept() — frexp/ldexp Decomposition
Tom Lane's audit identified regr_intercept() as vulnerable in the sub-expression Sx * Sxy / Sxx. Several approaches were debated:
- Reordering parenthesization — Dean demonstrated counterexamples for both
Sx * (Sxy / Sxx) and Sxy * (Sx / Sxx)
- Log-space computation — Rejected due to extensive special-casing for zero/negative values
Agreed solution: Use C99 frexp()/ldexp() for manual extended-exponent arithmetic:
if (offset == 0 || isinf(offset)) {
m_Sx = frexp(Sx, &n_Sx);
m_Sxy = frexp(Sxy, &n_Sxy);
m_Sxx = frexp(Sxx, &n_Sxx);
m_offset = m_Sx * m_Sxy / m_Sxx;
offset = ldexp(m_offset, n_Sx + n_Sxy - n_Sxx);
}
Mantissas are in [0.5, 1.0) so products/quotients cannot overflow; exponent arithmetic is in integer space. This is portable (C99 standard, already used in pg_prng.c), handles signs correctly, and requires minimal special-casing.
Backpatch Decision
Consensus: no backpatch. No prior user complaints, and consistency with the corr() fix argues for shipping together in v19. RMT concurrence was sought for inclusion in v19.
Current Status
The v2 patches for both regr_r2() and regr_intercept() are in review. Chengpeng Yan provided LGTM for regr_r2() and minor feedback on regr_intercept():
- Patch format issue preventing clean
git apply
- Naming preference (
offset over dy)
- Question about test coverage for underflow path and Inf/NaN guards
The thread is awaiting Dean Rasheed's response to these review comments.
Architectural Significance
This thread establishes the frexp/ldexp technique as a general pattern for avoiding intermediate overflow/underflow in ratio computations within PostgreSQL's floating-point aggregates — simpler and more robust than log-space approaches, with automatic sign handling and minimal special-casing.