pgstat: Flush some statistics within running transactions, take 2

First seen: 2026-05-17 14:34:01+00:00 · Messages: 1 · Participants: 1

Latest Update

2026-05-18 · claude-opus-4-6

pgstat: Flush Some Statistics Within Running Transactions, Take 2

Core Problem

PostgreSQL's statistics subsystem (pgstat) accumulates counters in backend-local memory and only flushes them to shared memory at transaction boundaries. This design creates a significant observability gap: for long-running transactions or workloads where real-time statistics visibility matters, the reported statistics can be stale by minutes or even hours. Users monitoring pg_stat_user_tables, pg_stat_io, pg_stat_wal, etc. cannot see activity from backends that are mid-transaction.

This is architecturally significant because:

  1. Monitoring blind spots: Long-running analytics queries or batch operations are invisible to monitoring tools until they commit/rollback.
  2. pg_stat_statements accuracy: Extensions like pg_stat_statements that rely on statistics infrastructure suffer from the same delayed visibility, making real-time query performance analysis unreliable.
  3. Capacity planning: IO and WAL statistics being deferred means operators cannot react to resource pressure in real-time.

Previous Attempt and Design Evolution

The original thread (take 1) apparently proposed automatic periodic flushing of statistics mid-transaction. Michael Paquier's feedback redirected the approach toward an on-demand API rather than automatic flushing. This is a meaningful architectural choice:

Proposed Solution

The patch introduces two APIs:

SQL API: pg_stat_flush_backend(pid)

This contrasts with the existing pg_stat_force_next_flush() which only marks that a flush should happen at the next transaction boundary — still deferred.

C API for Extensions

A C-level function that flushes the calling backend only (no cross-backend signaling). This is specifically designed to support the pg_stat_statements improvements being developed in parallel, where the extension needs to flush its own statistics at precise moments.

Key Technical Design: Transactional vs. Non-Transactional Counters

The most architecturally interesting aspect is the selective flush for relation statistics:

Deferred (transaction-dependent) counters:

These MUST be deferred because their correctness depends on the transaction outcome. If a transaction inserts 1000 rows and then rolls back, those counters should never have been visible. The n_live_tup / n_dead_tup estimates similarly depend on commit/abort.

Immediately flushable counters:

These reflect physical work already performed regardless of transaction outcome. A sequential scan happened whether or not the transaction commits. Block reads from disk are real IO that occurred.

Non-relation statistics (unconditionally flushed):

These have no transactional semantics — WAL written is WAL written, IO performed is IO performed.

Signal-Based Cross-Backend Flush Mechanism

The cross-backend flush via signaling raises several implementation considerations:

  1. Signal safety: The actual flush doesn't happen in the signal handler but at the next safe point (CHECK_FOR_INTERRUPTS), avoiding any reentrancy issues with shared memory access.
  2. Auxiliary process support: These processes don't call CHECK_FOR_INTERRUPTS in the same way, so the patch hooks into their main loop iteration — a pattern already used for other deferred work in these processes.
  3. Best-effort semantics: There's an inherent race between signaling and the target actually flushing. The flush is not synchronous for cross-backend calls, which is appropriate for statistics (eventual consistency is acceptable).

Relationship to pg_stat_statements Work

The C API is explicitly motivated by parallel work on pg_stat_statements improvements. This suggests a design where pg_stat_statements can flush its accumulated query statistics at statement completion rather than waiting for transaction end — critical for seeing individual statement costs within a multi-statement transaction.

Architectural Implications

This patch represents a shift in PostgreSQL's statistics philosophy from "strictly transaction-aligned reporting" to "report what you can as early as you can." The careful separation of transactional and non-transactional counters shows mature understanding of the consistency requirements — you don't want to report phantom writes that might be rolled back, but there's no reason to defer reporting physical IO that already happened.