Changing the state of data checksums in a running cluster

First seen: 2024-07-03 06:41:01+00:00 · Messages: 142 · Participants: 15

Latest Update

2026-06-01 · claude-opus-4-6

Monthly Summary: Online Data Checksums — May 2026

Overview

May 2026 saw continued post-commit stabilization of the online data checksums feature. The major commit had already landed, and this month's activity focused on identifying and fixing race conditions discovered through aggressive stress testing, plus routine patch pushes for minor follow-up fixes.

Key Developments

Promotion Race Condition Identified

Tomas Vondra identified a missing EmitProcSignalBarrier() call when a standby is promoted while in inprogress-on state. During promotion, StartupXLOG resets checksum state to off via direct XLogCtl update but never signals existing hot standby backends, leaving them with stale LocalDataChecksumVersion. While not a data corruption risk in practice (neither inprogress-on nor off triggers verification failures), it violates the architectural invariant that all state changes flow through the barrier mechanism. The same issue exists on the inprogress-offoff path.

Checkpointer/Worker Interleaving Non-Determinism

Tomas also reported that the checkpointer and datachecksum worker can interleave in ways that produce non-deterministic intermediate states. Specifically, checkpoint_redo can occur between XLogChecksums() WAL emission and the corresponding XLogCtl update, creating a window where WAL and shared memory disagree. A PoC fix using a new LWLock to serialize the critical section confirmed the hypothesis, though it was explicitly not proposed for commit (it caused deadlocks in some tests). No actual checksum failures or incorrect final states have been demonstrated from this race.

Minor Follow-up Patches Committed

Daniel Gustafsson pushed patches 0001-0003 (a small follow-up fix series) with minor tweaks, acknowledging contributions from two collaborators. The specific content of these patches was not detailed in the thread messages this month.

Open Questions

Status

The feature is in post-commit hardening phase. Stress testing continues to uncover subtle race conditions in edge cases (promotion, crash recovery, checkpoint interleaving), but none identified this month represent data corruption risks. The overall architecture — state machine with barrier synchronization, XLogCtl as source of truth, checkpoint-record embedding for recovery consistency — remains sound.

History (1 prior analysis)
2026-06-01 · claude-opus-4-6

Round Update: Promotion Race Fix Committed Before Beta1; DELAY_CHKPT_START Deferred to v20

This round covers the resolution of the issues identified in the previous round. The exchange is brief and confirmatory, with the key outcomes being:

1. Promotion Barrier Bug Fix Pushed

Daniel confirms he pushed the fix for the missing EmitProcSignalBarrier() on standby promotion (along with Tomas's other findings) ahead of the PostgreSQL 19 beta1 deadline. The buildfarm is reported happy. This was the three-patch series Daniel proposed in the previous round.

2. DELAY_CHKPT_START: Consensus to Leave for Now, Remove in v20

Daniel's position on DELAY_CHKPT_START is to leave it in place for v19 (erring on the safe side) and revisit removal in v20. His rationale: he cannot demonstrate it causing any error, but removing it carries risk this close to release. Tomas mildly disagrees, preferring to fix it for v19 to avoid confusing future readers, but doesn't push hard. Daniel acknowledges Tomas is "probably right" and commits to addressing it after beta1 ships.

3. Checkpointer/Worker Interleaving Lock

Daniel agrees an improved locking protocol may be needed but needs more time to think. No resolution yet — this remains open.

4. Comment Fixes and Data Type Change Split

Daniel split Tomas's minor patch into two: one for comment fixes, one for a data type change. Both apparently pushed as part of the pre-beta1 batch.