Incorrect Checksum in Control File with EXEC_BACKEND Builds
Core Problem
PostgreSQL's pg_control file is a critical data structure that stores database cluster state information (checkpoint LSN, timeline ID, system identifier, etc.). During server startup and normal operation, multiple processes need to read this file. The bug manifests as a torn read of the control file under the EXEC_BACKEND code path (used on Windows and optionally enabled for testing on Unix).
Why This Matters Architecturally
PostgreSQL has two process creation strategies:
-
fork() (Unix default): Child processes inherit the parent's memory, including
ControlFileDataalready loaded intoLocalControlFile. The postmaster reads the control file once at startup when no concurrent writers exist, then all forked children share that consistent snapshot via memory inheritance. -
EXEC_BACKEND (Windows, or
-DEXEC_BACKENDbuilds): Child processes are created viaexec()and must re-read shared state from disk or serialized parameters. In this path,LocalProcessControlFile()readspg_controldirectly from disk without holdingControlFileLock— and indeed cannot hold it, because the process hasn't yet attached to shared memory at that point.
The race condition occurs when:
- The startup/recovery process (or any process holding
ControlFileLockin shared memory) writes an updated control file - A newly exec'd child process simultaneously reads the same file
- The child sees a partially-written file where the data and CRC checksum are inconsistent
This results in FATAL: incorrect checksum in control file, crashing the affected backend and potentially triggering a cascade shutdown if restart_after_crash is off.
Observed Failure Scenario
In pg_rewind tests, after rewind completes and the primary restarts:
- The startup process begins recovery, updating the control file as it replays WAL
- The background writer (PID 2002307) is exec'd and calls
LocalProcessControlFile() - It reads a torn control file mid-write → CRC mismatch → FATAL
- Postmaster sees child crash, shuts down (restart_after_crash=off in tests)
- Test infrastructure detects startup failure →
Bail out!
Proposed Solutions
Approach 1: Retry on CRC Failure (Original Patch by Maksim Melnikov)
Modeled after commit 5725e4ebe7a936f724f21e7ee1e84e54a70bfd83, which solved the same torn-read problem for frontend programs (pg_controldata, pg_resetwal, etc.). The approach:
- If
LocalProcessControlFile()detects a CRC mismatch, sleep briefly and retry the read - This is a pragmatic fix: the write is atomic at the filesystem level for aligned 8KB writes on most systems, but not guaranteed by POSIX, so retrying handles the window
Limitation: Even with retries, different EXEC_BACKEND child processes may end up with different snapshots of the control file contents (read at different times), diverging from fork behavior where all children share the same snapshot inherited from the postmaster.
Approach 2: Pass Control File via BackendParameters (Alexander Korotkov's Patch)
A more architecturally sound solution that addresses both the torn-read problem AND the behavioral divergence between fork and EXEC_BACKEND:
- Serialize the
ControlFileDatacontents into theBackendParametersstructure that the postmaster already passes to exec'd children - Child processes receive the control file contents from the postmaster's memory, just as fork'd children would inherit it
- This ensures identical semantics between fork and EXEC_BACKEND paths: all children see the same control file snapshot that the postmaster had
Advantages:
- No torn reads possible (data comes from memory, not disk)
- No behavioral divergence between fork and EXEC_BACKEND
- No retry loops or sleep calls
- Conceptually cleaner — matches the fork() semantics exactly
Trade-off: Slightly increases the size of BackendParameters, but ControlFileData is small (~300 bytes) so this is negligible.
Key Technical Context
The BackendParameters mechanism already exists for passing other postmaster state to exec'd children on Windows (signal handles, socket descriptors, shared memory attachment info, etc.). Adding the control file contents is a natural extension of this pattern.
The earlier thread referenced (commit 5725e4ebe) dealt with the frontend case where there's no postmaster to provide the data — retrying is the only option there. For the backend case under EXEC_BACKEND, the postmaster-mediated approach is superior because the postmaster already has a consistent copy.
Conclusion
Alexander Korotkov's approach of passing control file data through BackendParameters is the preferred solution. It eliminates the class of torn-read bugs for EXEC_BACKEND while simultaneously ensuring fork/exec behavioral parity — an important property for Windows platform correctness and for using EXEC_BACKEND as a testing/debugging tool on Unix.