Fix Unsafe PlannedStmt Access in pg_stat_statements
Core Problem
In PostgreSQL's pg_stat_statements extension, the pgss_ProcessUtility() hook has a well-documented invariant: after calling the next utility hook (which actually executes the utility statement), the PlannedStmt *pstmt pointer must not be dereferenced. This is because certain statements—particularly ROLLBACK and similar transaction-ending commands—can free the memory backing the PlannedStmt structure during execution.
The code has a longstanding comment explicitly warning about this:
/*
* CAUTION: do not access the *pstmt data structure again below here.
* If it was a ROLLBACK or similar, that data structure may have been freed.
*/
The established pattern is to copy all needed fields from pstmt into local variables before calling the next hook. This was already done for queryId, stmt_location, and stmt_len.
However, commit 3357471cf9f5e470dfed0c7919bcf31c7efaf2b9 introduced pstmt->planOrigin as a new field and added a reference to it in the pgss_store() call that occurs after the dangerous point—violating the invariant. This constitutes a use-after-free bug: if the utility statement being processed is a ROLLBACK (or triggers one internally, such as within a CALL to a procedure), the PlannedStmt may already be freed, and reading pstmt->planOrigin accesses deallocated memory.
Why This Matters Architecturally
-
Memory safety: This is a classic use-after-free. While it may not crash in most scenarios (the memory might still be mapped and contain stale-but-readable data), it is undefined behavior that Valgrind can detect and that could cause incorrect statistics recording or crashes under certain memory allocator behaviors.
-
Hook contract enforcement: PostgreSQL's hook-based extensibility model relies on implicit contracts about pointer lifetime. When these contracts are violated, it undermines the reliability of all extensions using the same hooks. The
pg_stat_statementsextension is one of the most widely deployed contrib modules, making this particularly impactful. -
Difficulty of detection: As Michael Paquier noted, the existing regression tests with Valgrind enabled did not catch this bug. It required a specific scenario—using the extended query protocol with a
ROLLBACKinside aCALLprocedure, executed twice—to trigger the actual freeing of thePlannedStmtmemory in a way that Valgrind could report as an invalid read. This highlights a testing gap for lifetime-sensitive code paths.
Proposed Solutions
Primary Fix: Save planOrigin to a Local Variable
Following the established pattern, the fix saves pstmt->planOrigin into a local variable (saved_planOrigin or similar) before the utility hook is invoked, then uses that saved value in the subsequent pgss_store() call. This is the minimal, correct fix that eliminates the use-after-free.
Defensive Enhancement: NULL-out pstmt After the Dangerous Point
Andres Freund suggested making the invariant violation fail loudly rather than silently: set pstmt = NULL immediately after the point where it becomes unsafe to dereference. This way, any future code that accidentally accesses pstmt after that point will immediately crash with a NULL pointer dereference (or trigger an assertion), rather than silently reading freed memory that might still appear valid.
This is a defensive programming technique that converts a subtle, hard-to-detect bug (use-after-free) into an obvious, immediately-catchable crash. All three participants agreed this was worthwhile, and Michael Paquier prepared a patch implementing it.
Test Coverage Addition
Michael Paquier also developed a test case that exercises the specific code path where the PlannedStmt gets freed during hook execution—a ROLLBACK inside a CALL procedure via the extended protocol. This test ensures the regression won't recur and validates the fix under Valgrind.
Key Design Decisions and Tradeoffs
- Comment vs. code enforcement: The existing comment was deemed "clear enough" by all participants, but comments alone cannot prevent future violations. The NULL-assignment approach provides mechanical enforcement without adding significant code complexity.
- Backport scope: The fix addresses a bug introduced by a specific commit (
3357471cf9f5), so the backport range is determined by where that commit landed. - No restructuring: Rather than restructuring the function to avoid the lifetime issue entirely (e.g., by saving all fields up front in a struct), the incremental approach of adding one more saved variable was preferred for minimal invasiveness.
Valgrind Report Details
The invalid read manifests as:
==28059== Invalid read of size 4
==28059== at 0x165E42DF: pgss_ProcessUtility (pg_stat_statements.c:1200)
==28059== by 0x158AFCD: ProcessUtility (utility.c:524)
==28059== by 0x1587F3B: PortalRunUtility (pquery.c:1149)
==28059== by 0x15877E9: FillPortalStore (pquery.c:1022)
The "size 4" read corresponds to planOrigin being an int/enum field. The call stack shows this occurring in the normal utility execution path through the portal infrastructure.