[PATCH] pg_stat_statements: add last_execution_start column

First seen: 2026-03-30 16:37:25+00:00 · Messages: 6 · Participants: 2

Latest Update

2026-05-14 · claude-opus-4-6

Technical Analysis: Adding last_execution_start to pg_stat_statements

The Core Problem

pg_stat_statements is PostgreSQL's most widely-used monitoring extension, yet it lacks a fundamental piece of temporal metadata: when was a given statement last executed? Without this, monitoring systems must resort to polling all counters (e.g., calls, total_exec_time) and diffing them to detect activity — an expensive and indirect approach. A simple last_execution_start timestamp would allow queries like WHERE last_execution_start >= $1 to instantly identify recently-active statements, enabling efficient incremental monitoring.

This patch is the successor to a prior stats_last_updated effort that was rejected primarily on performance grounds: calling GetCurrentTimestamp() (which issues a gettimeofday() syscall) inside the hot stats-accumulation path (pgss_store()) caused a measured 5–6% TPS regression on short-transaction workloads.

Architectural Context: Why This Is Harder Than It Looks

The GetCurrentStatementStartTimestamp() Shortcut

The v1 patch's key insight is that PostgreSQL already tracks statement start time in a process-local variable (stmtStartTimestamp), set at the beginning of each statement cycle. Reading this variable is essentially free — no syscall, no lock, just a memory read. This eliminates the performance concern that killed the predecessor patch.

Benchmark validation: v1 showed ~0.1% TPS difference (within noise), compared to the 5–6% regression of the GetCurrentTimestamp() approach.

The Deferred ExecutorEnd Problem

However, this elegant approach has a critical correctness flaw exposed by the extended query protocol. In the extended protocol (used by JDBC, libpq in pipeline mode, etc.), ExecutorEnd for a statement can be deferred until the next Bind message arrives. This means:

  1. Statement A starts at time T1, executes, but ExecutorEnd is deferred.
  2. Statement B starts at time T2, its Bind triggers ExecutorEnd for A.
  3. When pgss_store() runs for A (during its deferred ExecutorEnd), GetCurrentStatementStartTimestamp() now returns T2, not T1.

This is not a theoretical edge case — it's the normal code path for any client using the extended query protocol with multiple open portals or cursors with fetch sizes. Sami Seih demonstrated this concretely with a JDBC test showing two portals both getting the timestamp of the second portal's start time.

Evolution of Solutions

v1: Direct GetCurrentStatementStartTimestamp() in pgss_store()

v2: Bounded Static Array Indexed by Nesting Level

v3: Store Timestamp in EState (queryDesc->estate)

v4: Rebase Only

Key Design Decisions and Trade-offs

1. Statement Start Time vs. Clock Timestamp

The patch deliberately records the statement start timestamp (as known to PostgreSQL's statement lifecycle), not the wall-clock time at stats recording. This means:

2. Reset Semantics

last_execution_start is:

3. Core vs. Extension Boundary

A recurring tension in this thread is whether to modify core PostgreSQL structures (EState) for the benefit of a contrib extension. Pavlo initially wanted to avoid touching core ("I don't want to touch core functionality in this patch. I'm afraid it could lead to even more problems with accepting."), but Sami convincingly argued that EState is the correct architectural location, following existing patterns, and that the bounded-array alternative is fundamentally broken.

What Remains Unresolved

  1. Committer review: No committer has weighed in on whether adding a field to EState purely for pgss consumption is acceptable. This is the most likely point of contention during final review.
  2. Nested statement documentation: The thread agrees that toplevel = false statements showing the top-level start time needs documentation, but the adequacy of the current docs hasn't been reviewed by a docs specialist.
  3. No discussion of shared memory / serialization implications: The timestamp must survive pg_stat_statements file serialization across restarts — this should be verified but isn't discussed.

Impact Assessment

This is a high-value, low-risk change for the monitoring ecosystem. The zero-overhead approach (reading an already-available timestamp rather than issuing a syscall) elegantly solves the performance concern that blocked the predecessor. The v3 approach of storing in EState is architecturally sound and follows existing patterns. The primary risk is social/process: getting approval for the EState struct modification in core.