Incorrect checksum in control file with pg_rewind test

First seen: 2025-09-04 15:18:30+00:00 · Messages: 6 · Participants: 4

Latest Update

2026-05-14 · claude-opus-4-6

Incorrect Checksum in Control File with EXEC_BACKEND Builds

Core Problem

PostgreSQL's pg_control file is a critical data structure that stores database cluster state information (checkpoint LSN, timeline ID, system identifier, etc.). During server startup and normal operation, multiple processes need to read this file. The bug manifests as a torn read of the control file under the EXEC_BACKEND code path (used on Windows and optionally enabled for testing on Unix).

Why This Matters Architecturally

PostgreSQL has two process creation strategies:

  1. fork() (Unix default): Child processes inherit the parent's memory, including ControlFileData already loaded into LocalControlFile. The postmaster reads the control file once at startup when no concurrent writers exist, then all forked children share that consistent snapshot via memory inheritance.

  2. EXEC_BACKEND (Windows, or -DEXEC_BACKEND builds): Child processes are created via exec() and must re-read shared state from disk or serialized parameters. In this path, LocalProcessControlFile() reads pg_control directly from disk without holding ControlFileLock — and indeed cannot hold it, because the process hasn't yet attached to shared memory at that point.

The race condition occurs when:

This results in FATAL: incorrect checksum in control file, crashing the affected backend and potentially triggering a cascade shutdown if restart_after_crash is off.

Observed Failure Scenario

In pg_rewind tests, after rewind completes and the primary restarts:

  1. The startup process begins recovery, updating the control file as it replays WAL
  2. The background writer (PID 2002307) is exec'd and calls LocalProcessControlFile()
  3. It reads a torn control file mid-write → CRC mismatch → FATAL
  4. Postmaster sees child crash, shuts down (restart_after_crash=off in tests)
  5. Test infrastructure detects startup failure → Bail out!

Proposed Solutions

Approach 1: Retry on CRC Failure (Original Patch by Maksim Melnikov)

Modeled after commit 5725e4ebe7a936f724f21e7ee1e84e54a70bfd83, which solved the same torn-read problem for frontend programs (pg_controldata, pg_resetwal, etc.). The approach:

Limitation: Even with retries, different EXEC_BACKEND child processes may end up with different snapshots of the control file contents (read at different times), diverging from fork behavior where all children share the same snapshot inherited from the postmaster.

Approach 2: Pass Control File via BackendParameters (Alexander Korotkov's Patch)

A more architecturally sound solution that addresses both the torn-read problem AND the behavioral divergence between fork and EXEC_BACKEND:

Advantages:

Trade-off: Slightly increases the size of BackendParameters, but ControlFileData is small (~300 bytes) so this is negligible.

Key Technical Context

The BackendParameters mechanism already exists for passing other postmaster state to exec'd children on Windows (signal handles, socket descriptors, shared memory attachment info, etc.). Adding the control file contents is a natural extension of this pattern.

The earlier thread referenced (commit 5725e4ebe) dealt with the frontend case where there's no postmaster to provide the data — retrying is the only option there. For the backend case under EXEC_BACKEND, the postmaster-mediated approach is superior because the postmaster already has a consistent copy.

Conclusion

Alexander Korotkov's approach of passing control file data through BackendParameters is the preferred solution. It eliminates the class of torn-read bugs for EXEC_BACKEND while simultaneously ensuring fork/exec behavioral parity — an important property for Windows platform correctness and for using EXEC_BACKEND as a testing/debugging tool on Unix.