[PATCH] Expose checkpoint timestamp and duration in pg_stat_checkpointer

First seen: 2025-11-24 06:10:44+00:00 · Messages: 15 · Participants: 4

Latest Update

2026-05-18 · claude-opus-4-6

Technical Analysis: Exposing Checkpoint Timestamp and Duration in pg_stat_checkpointer

Core Problem

PostgreSQL's checkpointer process is critical to database reliability — it ensures dirty buffers are flushed to disk at regular intervals, bounding crash recovery time. However, monitoring checkpoint behavior has significant gaps:

  1. Cumulative-only statistics: pg_stat_checkpointer exposes only cumulative counters (write_time, sync_time, buffers_written, etc.). To determine the duration of any individual checkpoint, external tools must compute deltas between successive polls — a fragile approach when multiple checkpoints occur between sampling intervals.

  2. No checkpoint type visibility: The checkpoint completion log message (controlled by log_checkpoints) doesn't indicate whether the checkpoint was triggered by timeout, WAL volume (immediate/wal), or manual CHECKPOINT command. This makes it harder to correlate write patterns with trigger reasons.

  3. No direct "last event" gauge: Unlike some other subsystems, there's no way to ask "how long did the most recent checkpoint take?" via SQL without parsing server logs.

Proposed Solution

The patch adds two new fields to pg_stat_checkpointer:

Additionally, the patch modifies checkpoint completion log messages to include the checkpoint type (immediate/timed/manual), making log-based tools like pgbadger able to categorize checkpoints by trigger reason.

Implementation Details

The changes touch:

The key architectural decision is that these are gauge values (overwritten per checkpoint) rather than cumulative counters. This breaks the pattern used by all other fields in pg_stat_checkpointer and is the primary point of contention.

Key Technical Debate: Gauges vs. Counters in pg_stat Views

The Case For (Álvaro Herrera, Soumya)

The Case Against (Michael Banck)

Álvaro's Deeper Insight on Duration Semantics

Álvaro Herrera raised an important point: raw checkpoint duration is often misleading because much of it is intentional sleep (governed by checkpoint_completion_target). A 5-minute checkpoint might be perfectly healthy if the checkpointer spent 4.5 minutes sleeping to spread I/O. What's actually useful is active time vs. sleep time — how much time was spent doing real work versus throttling. The existing write_time and sync_time partially capture this, but only cumulatively. This insight suggests the patch may not go far enough in providing actionable data.

Patch Quality Issues

Architectural Context

The pg_stat_checkpointer view was refactored in PostgreSQL 15 (split from pg_stat_bgwriter). It uses the pgstat shared memory stats infrastructure introduced in PG15, where pending stats accumulate in backend-local memory and get flushed to shared memory. The proposed change works within this framework — PendingCheckpointerStats accumulates the values in LogCheckpointEnd() and they propagate through the normal stats flush path.

The gauge vs. counter tension reflects a broader question about PostgreSQL's monitoring philosophy. The current design strongly favors cumulative counters that external systems can differentiate. Adding gauges creates a hybrid model that may confuse the semantics of stats reset operations.

Current Status

The thread went dormant after December 2025. Soumya attempted to revive it in May 2026 but received no responses. The patch needs:

  1. Rebase on current HEAD
  2. Split into two patches (logging change + stats view change)
  3. Stronger justification for why gauges are appropriate here, addressing the "missed checkpoints" problem
  4. Potentially incorporating Álvaro's suggestion about active vs. sleep time decomposition