2026-05-27 · claude-opus-4-6

Technical Analysis: Defensive Hardening of sprintf → snprintf in pg_stat_statements

Core Problem

The generate_normalized_query() function in contrib/pg_stat_statements uses unbounded sprintf() to format query parameter placeholders (e.g., $1, $2, ..., $2147483647) into a pre-allocated buffer during query normalization. While the current buffer sizing logic does account for worst-case expansion — each original constant occupies at least 1 byte in the source query, while each replacement placeholder occupies at most 11 bytes ($ + up to 10 digits for INT_MAX) plus the optional squash comment " /*, ... */" — the safety guarantee is non-local: you must reason about allocation logic elsewhere in the function to verify that sprintf cannot overflow.

This is a classic defensive programming concern rather than a demonstrated vulnerability. The fundamental tension is between:

Minimal code changes (current approach relies on proven buffer math)
Defense-in-depth (making safety locally verifiable at the write site)

Architectural Context

pg_stat_statements Query Normalization Pipeline

When pg_stat_statements normalizes queries, it replaces literal constants with positional parameters ($N) to group semantically equivalent queries. The flow is:

Parser identifies constant locations (jstate->clocations[])
generate_normalized_query() allocates a buffer and iterates through constant locations
For each constant, it copies the preceding query text verbatim, then writes a $N placeholder
The buffer size is computed based on query_len plus worst-case expansion per constant

The sprintf call in question is the core write operation in step 3:

n_quer_loc += sprintf(norm_query + n_quer_loc, "$%d%s",
                      num_constants_replaced + 1 + jstate->highest_extern_param_id,
                      locs[i].squashed ? " /*, ... */" : "");

Why This Matters

The risk is future regression safety. If someone later modifies the buffer allocation logic (e.g., changes how highest_extern_param_id interacts with buffer sizing, or adds new formatting to the placeholder string), the unbounded sprintf silently becomes a heap buffer overflow. With snprintf, any such miscalculation results in truncation rather than memory corruption — a much more debuggable failure mode.

Proposed Changes

1. sprintf → snprintf Conversion

n_quer_loc += snprintf(norm_query + n_quer_loc,
                       norm_query_buflen - n_quer_loc + 1,
                       "$%d%s",
                       num_constants_replaced + 1 + jstate->highest_extern_param_id,
                       locs[i].squashed ? " /*, ... */" : "");

Technical note on the +1: The snprintf size parameter includes the null terminator. The +1 suggests the author is accounting for snprintf's convention where the size argument is the total buffer space including the trailing \0. This needs careful review — if norm_query_buflen already represents the usable space excluding the terminator, the +1 is correct; if it includes terminator space, this could allow a 1-byte overwrite. This is a subtle point that reviewers should verify against the buffer allocation code.

2. NULL Pointer Safety in gc_qtexts()

Initialize qbuffer to NULL at declaration
Set qbuffer = NULL after pfree() in both success and error (PG_CATCH) paths
Set qbuffer = NULL after pfree() in pgss_shmem_shutdown() error path

These prevent use-after-free if control flow is ever modified to access qbuffer after the free point. This follows PostgreSQL's general defensive pattern seen throughout the backend (e.g., setting pointers to NULL after pfree in resource cleanup paths).

Critical Review Points

Potential Issues with This Patch

snprintf return value semantics: On truncation, snprintf returns the number of characters that would have been written (excluding null terminator), not the number actually written. If truncation occurs, n_quer_loc would advance past the actual end of written data, causing subsequent writes to leave gaps or overflow. The patch should arguably add an assertion or clamp: n_quer_loc = Min(n_quer_loc + ret, norm_query_buflen).
Performance: The overhead of snprintf vs sprintf is negligible for this use case (one call per normalized constant, not in a tight loop), so this is not a concern.
Upstream philosophy: PostgreSQL historically has mixed views on "defensive" patches that don't fix actual bugs. Some committers prefer minimal code changes and rely on proven invariants rather than adding defensive checks that might mask actual logic errors. The counter-argument is that snprintf is now the project standard for new code.
The +1 in the size calculation needs verification against the actual allocation in generate_normalized_query() to confirm correctness.

Relationship to Broader PostgreSQL Hardening

PostgreSQL has been gradually replacing sprintf with snprintf throughout the codebase over many years. The src/port/snprintf.c provides a portable implementation. This patch aligns with that long-term direction, though contrib modules have historically received less scrutiny for such hardening.

Upstream vs. Downstream Context

The patch originates from Apache Cloudberry (a PostgreSQL derivative) where a maintainer suggested pushing this upstream rather than maintaining it as a downstream delta. This is a common and healthy pattern — downstream forks often identify hardening opportunities that benefit upstream. However, it also means the patch may not have been evaluated against upstream's specific coding standards and commit bar.

Assessment

This is a low-risk, low-reward patch. It doesn't fix a known bug, but it does align with modern defensive coding practices. The main technical risk is the snprintf return value handling on truncation — if the buffer sizing invariant ever breaks, the current patch would produce silently corrupted (truncated) normalized queries rather than detecting the error. A more robust version would add an Assert() that truncation never occurs, making the defensive intent explicit while catching violations during development.

[PATCH] Defensive hardening: Replace sprintf with snprintf in pg_stat_statements

Latest Update