2026-05-18 · claude-opus-4-6

pgstat: Flush Some Statistics Within Running Transactions, Take 2

Core Problem

PostgreSQL's statistics subsystem (pgstat) accumulates counters in backend-local memory and only flushes them to shared memory at transaction boundaries. This design creates a significant observability gap: for long-running transactions or workloads where real-time statistics visibility matters, the reported statistics can be stale by minutes or even hours. Users monitoring pg_stat_user_tables, pg_stat_io, pg_stat_wal, etc. cannot see activity from backends that are mid-transaction.

This is architecturally significant because:

Monitoring blind spots: Long-running analytics queries or batch operations are invisible to monitoring tools until they commit/rollback.
pg_stat_statements accuracy: Extensions like pg_stat_statements that rely on statistics infrastructure suffer from the same delayed visibility, making real-time query performance analysis unreliable.
Capacity planning: IO and WAL statistics being deferred means operators cannot react to resource pressure in real-time.

Previous Attempt and Design Evolution

The original thread (take 1) apparently proposed automatic periodic flushing of statistics mid-transaction. Michael Paquier's feedback redirected the approach toward an on-demand API rather than automatic flushing. This is a meaningful architectural choice:

Automatic flushing would add overhead to every transaction and introduce timing-dependent behavior that's hard to reason about.
On-demand API gives control to the user/extension, follows the principle of least surprise, and avoids adding unconditional overhead.

Proposed Solution

The patch introduces two APIs:

SQL API: `pg_stat_flush_backend(pid)`

If pid matches the calling backend → flush occurs immediately (synchronous).
If pid is a different backend → the target is signaled and flushes at:
- Next CHECK_FOR_INTERRUPTS() for regular backends
- Next main-loop iteration for auxiliary processes (bgwriter, walwriter, checkpointer)

This contrasts with the existing pg_stat_force_next_flush() which only marks that a flush should happen at the next transaction boundary — still deferred.

C API for Extensions

A C-level function that flushes the calling backend only (no cross-backend signaling). This is specifically designed to support the pg_stat_statements improvements being developed in parallel, where the extension needs to flush its own statistics at precise moments.

Key Technical Design: Transactional vs. Non-Transactional Counters

The most architecturally interesting aspect is the selective flush for relation statistics:

Deferred (transaction-dependent) counters:

tuples_inserted, tuples_updated, tuples_deleted
live_tuples, dead_tuples estimates

These MUST be deferred because their correctness depends on the transaction outcome. If a transaction inserts 1000 rows and then rolls back, those counters should never have been visible. The n_live_tup / n_dead_tup estimates similarly depend on commit/abort.

Immediately flushable counters:

seq_scan, idx_scan (scan counts)
tuples_fetched, tuples_returned
blocks_hit, blocks_read (buffer access)
n_tup_hot_upd (HOT update counts)

These reflect physical work already performed regardless of transaction outcome. A sequential scan happened whether or not the transaction commits. Block reads from disk are real IO that occurred.

Non-relation statistics (unconditionally flushed):

Function execution statistics
IO statistics (pg_stat_io)
WAL statistics (pg_stat_wal)
All other pending stats

These have no transactional semantics — WAL written is WAL written, IO performed is IO performed.

Signal-Based Cross-Backend Flush Mechanism

The cross-backend flush via signaling raises several implementation considerations:

Signal safety: The actual flush doesn't happen in the signal handler but at the next safe point (CHECK_FOR_INTERRUPTS), avoiding any reentrancy issues with shared memory access.
Auxiliary process support: These processes don't call CHECK_FOR_INTERRUPTS in the same way, so the patch hooks into their main loop iteration — a pattern already used for other deferred work in these processes.
Best-effort semantics: There's an inherent race between signaling and the target actually flushing. The flush is not synchronous for cross-backend calls, which is appropriate for statistics (eventual consistency is acceptable).

Relationship to pg_stat_statements Work

The C API is explicitly motivated by parallel work on pg_stat_statements improvements. This suggests a design where pg_stat_statements can flush its accumulated query statistics at statement completion rather than waiting for transaction end — critical for seeing individual statement costs within a multi-statement transaction.

Architectural Implications

This patch represents a shift in PostgreSQL's statistics philosophy from "strictly transaction-aligned reporting" to "report what you can as early as you can." The careful separation of transactional and non-transactional counters shows mature understanding of the consistency requirements — you don't want to report phantom writes that might be rolled back, but there's no reason to defer reporting physical IO that already happened.

pgstat: Flush some statistics within running transactions, take 2

Latest Update

pgstat: Flush Some Statistics Within Running Transactions, Take 2

Core Problem

Previous Attempt and Design Evolution

Proposed Solution

SQL API: `pg_stat_flush_backend(pid)`

C API for Extensions

Key Technical Design: Transactional vs. Non-Transactional Counters

Deferred (transaction-dependent) counters:

Immediately flushable counters:

Non-relation statistics (unconditionally flushed):

Signal-Based Cross-Backend Flush Mechanism

Relationship to pg_stat_statements Work

Architectural Implications

pgstat: Flush some statistics within running transactions, take 2

Latest Update

pgstat: Flush Some Statistics Within Running Transactions, Take 2

Core Problem

Previous Attempt and Design Evolution

Proposed Solution

SQL API: pg_stat_flush_backend(pid)

C API for Extensions

Key Technical Design: Transactional vs. Non-Transactional Counters

Deferred (transaction-dependent) counters:

Immediately flushable counters:

Non-relation statistics (unconditionally flushed):

Signal-Based Cross-Backend Flush Mechanism

Relationship to pg_stat_statements Work

Architectural Implications

SQL API: `pg_stat_flush_backend(pid)`