allow spread checkpoints when changing checksums online

First seen: 2026-05-04 13:42:17+00:00 · Messages: 1 · Participants: 1

Latest Update

2026-05-06 · opus 4.7

Analysis: Reintroducing Spread Checkpoints for Online Checksum Changes

Background and Architectural Context

PostgreSQL 18 introduced the ability to enable or disable data checksums on a running cluster via pg_enable_data_checksums() / pg_disable_data_checksums(). Previously, toggling checksums required pg_checksums on an offline cluster or an initdb-time decision. The online facility works by:

  1. Persisting a state transition in the control file (e.g., inprogress-on, on, inprogress-off, off).
  2. For enabling: scanning every relation, reading each page, computing the checksum, and writing it back (WAL-logged as an FPI so standbys/crash recovery learn the new checksum value).
  3. Forcing checkpoints at state transition boundaries so that the on-disk state is guaranteed to reflect the new checksum regime before moving to the next phase. Without these checkpoints, a crash could leave pages with old (now-invalid) checksums under a control-file state that demands checksum verification.

The checkpoints are therefore not optional — they are a correctness barrier between phases of the state machine. The only question is how aggressively they should be driven.

The Core Problem

In an earlier iteration of the patch series (v20251201, referenced via Daniel Gustafsson's message [1]), the API exposed a fast boolean parameter controlling whether the checkpoints issued during the transition used CHECKPOINT_FAST (equivalent to CHECKPOINT IMMEDIATE) or a spread checkpoint governed by checkpoint_completion_target. Somewhere between that version and what was committed, the fast parameter was dropped and the implementation hard-coded CHECKPOINT_FAST. Vondra notes there is no recorded rationale on the thread for this removal — it appears to have been either an oversight during a simplification pass or a deliberate minimization of surface area that was never revisited.

The hard-coded fast behavior is problematic for two reasons Vondra articulates:

1. Interaction with cost-based throttling of the rewrite phase

The online checksum worker honors vacuum_cost_limit / vacuum_cost_delay semantics, allowing operators to deliberately slow the page-rewrite phase to minimize I/O impact on production workloads. A fast (immediate) checkpoint at the end of such a throttled rewrite phase is architecturally inconsistent: the user has explicitly asked for low-impact background work, and the final checkpoint then dumps a large volume of dirty buffers (including all the just-rewritten pages) as fast as the I/O subsystem will accept them. On a large, busy system this single synchronous event can dwarf the disruption of the throttled rewrite itself and defeat the purpose of throttling.

2. The disable path has no amortizing rewrite work

When disabling checksums, no pages are rewritten and nothing is WAL-logged beyond the control-file state change. The entire cost of the operation is effectively the forced checkpoint(s). Arguing "the checkpoint is a small fraction of the total work" — which may be defensible for enabling on a huge database because the WAL volume from FPIs dominates — simply does not apply here. Forcing a fast checkpoint on a busy production system merely to flip a control-file flag is gratuitously disruptive.

The Proposed Patch

The patch is described as "mostly extracted from v20251201": it re-adds a fast boolean parameter to both pg_enable_data_checksums() and pg_disable_data_checksums(), threading it down to the RequestCheckpoint() calls so that CHECKPOINT_FAST is set conditionally rather than unconditionally. This is a small, mechanical change — the machinery already existed and was removed; the patch restores it.

Key Design Questions Raised

Default value of fast

Vondra chose fast=true to match:

He suspects, however, that most production deployments would prefer fast=false. This is the classic tension between "safe/fast default for small systems and tests" versus "least-surprise for large production clusters where these functions are most likely to be invoked deliberately and carefully." No consensus is expressed in this single message; it is explicitly flagged as an open question.

TAP test coverage

Exercising fast=false in the test_checksums TAP suite would be valuable for coverage but painful for test runtime because spread checkpoints wait on checkpoint_timeout × checkpoint_completion_target. Vondra notes a workaround — aggressively lowering checkpoint_timeout in the test cluster (as he did in the TAP tests in [2]) — but is unconvinced it is worth the complexity, especially since the checkpoints remain synchronous from the caller's perspective regardless of speed.

Technical Insights and Implications

Participant Dynamics

This is a single-message thread (as presented) authored by Tomas Vondra, a major committer with deep history in checkpoint, WAL, and storage work. He is essentially self-auditing a regression introduced between patch versions of a feature he was involved in reviewing, and proposing a restoration. The tone ("I don't buy that, for two reasons") indicates he is pushing back against an argument made earlier in the original thread — likely by whoever justified dropping the fast parameter — but he does not name that person and does not have a recorded counter-argument to quote.

The referenced message [1] points to Daniel Gustafsson (yesql.se), the primary author/maintainer of the online-checksums feature, whose response to this patch will likely determine its fate.