Core Problem: A Race Between ProcSignalInit and EmitProcSignalBarrier
This thread uncovers a long-latent but recently-unmasked race condition in PostgreSQL's ProcSignalBarrier (PSB) machinery — the mechanism used to force all backends to acknowledge and act on a piece of globally-changed state (e.g., smgr invalidations, online wal_level changes, data checksum status changes, logical-decoding xlog info updates).
The Barrier Protocol, Briefly
ProcSignal slots in shared memory carry two relevant fields per process:
pss_pid— the process's PID, used as both a liveness flag ("slot occupied") and the signal target.pss_barrierGeneration— the last barrier generation this process has absorbed.
The emitter (EmitProcSignalBarrier) bumps a global generation counter, OR's the relevant barrier-type bit into each live slot, sends SIGUSR1 to each occupant, and then WaitForProcSignalBarrier spins until every slot's pss_barrierGeneration catches up to the emitted generation. The signal handler in the receiver calls HandleProcSignalBarrierInterrupt, which sets a latch; the receiver's CFI later runs ProcessProcSignalBarrier to dispatch each pending barrier bit and finally advance its pss_barrierGeneration.
The Race
The attacker window lives inside ProcSignalInit, which:
- Takes the slot's spinlock.
- Initializes
pss_barrierGenerationto the current global generation (so the new process is treated as already caught up to anything emitted before it existed). - Fills cancel-key fields, etc.
- Publishes
pss_pid = MyProcPidvia an atomic write (and releases the spinlock).
Meanwhile EmitProcSignalBarrier scans slots and — crucially — performs a lock-free pss_pid == 0 check to decide whether to skip a slot, before taking the spinlock to set the barrier bit. So the following interleaving, identified by Sawada, is possible:
- Newcomer: sets
pss_barrierGeneration = global_genunder spinlock.pss_pidis still 0. - Emitter: bumps
global_gentoglobal_gen+1, scans slots, seespss_pid == 0, skips this slot. It neither sets the barrier flag nor sendsSIGUSR1. - Newcomer: writes
pss_pid = MyProcPid, becomes visible. - Waiter (
WaitForProcSignalBarrier): now sees a live slot withpss_barrierGeneration = global_gen < global_gen+1and waits forever. The newcomer has no pending flag, no signal, no reason ever to update its generation.
The barrier-generation bookkeeping that was supposed to make late-joiners automatically "caught up" is defeated because the generation snapshot is taken before the PID publication, but the emitter orders the work the opposite way (check PID → then bump flags).
Why This Suddenly Matters
The PSB mechanism has existed since v14, but:
- In v14 it was unused.
- In v15–v18 it was used only for
smgrinvalidations driven byDROP DATABASE/DROP TABLESPACE— rare, operator-driven events. Alexander Lakhin's test-suite sweep confirmed this: the failure mode reproduces on033_replay_tsp_drops,030_stats_cleanup_replica_standby,040_standby_failover_slots_sync(DROP DATABASE redo), and002_compare_backups_pitr1— all DROP-DB/TSP paths. - In master, commit 67c20979c (Sawada, 2025-12-23) added
UpdateLogicalDecodingStatusEndOfRecovery(), which callsEmitProcSignalBarrier(PROCSIGNAL_BARRIER_UPDATE_XLOG_LOGICAL_INFO)unconditionally duringStartupXLOG— every startup, even after a clean shutdown. The startup process races with the checkpointer and bgwriter that the postmaster just forked moments earlier, making the window hit in ordinarypg_ctl start. Onlinewal_levelchanges and checksum-status changes further increase exposure.
This is why Alexander's bisect lands on 67c20979c even though the underlying bug has been present since v15: the bug went from "hits on rare DDL" to "hits during every boot."
Matthias's Initial (Mis)diagnosis
Matthias originally hypothesized a different race: that AuxiliaryProcessMainCommon registers the ProcSignal slot before the aux process installs its SIGUSR1 handler, so signals delivered in that window could be lost. Andres correctly rebutted this: postmaster children are forked with all signals blocked (sigprocmask(SIG_SETMASK, &BlockSig, ...)), and the unblock happens only after signal handlers are installed. Thus a signal arriving in the "window" is merely pended by the kernel and delivered at unblock time — no loss. Matthias conceded this was a misidentification.
However, Andres's ancillary observation — that most signal-handler setup should be moved into AuxiliaryProcessMainCommon, with per-process exceptions (e.g., checkpointer's SIGTERM handling) configured after — remains an independent cleanup suggestion, not the fix.
Proposed Fix
Sawada's patch targets the actual race by reordering within ProcSignalInit and/or changing how the emitter detects occupancy so that the pairing (pss_pid set) → (pss_barrierGeneration initialized) is ordered such that any emitter observing pss_pid != 0 is guaranteed to also see — or to install — the barrier flags. The essential invariant needed is:
It must be impossible for an emitter to skip a slot (on the basis of
pss_pid == 0) and still have the waiter later observe that slot as live with a stalepss_barrierGeneration.
This can be achieved either by:
- Publishing
pss_pidbefore reading the global barrier generation intopss_barrierGeneration, and having the emitter take the per-slot lock before checking PID, so the emitter either (a) findspss_pid == 0and the newcomer will later read a generation ≥ this emission, or (b) findspss_pid != 0and flags + signals it. - Having
ProcSignalInitre-read the global generation after publishing PID, under a barrier that pairs with the emitter's scan.
The thread's patch takes the approach of re-synchronizing the generation capture with PID publication so that the emitter's lock-free PID check is safe.
Secondary Fix: InitializeProcessXLogLogicalInfo Ordering
Sawada identified a related but distinct correctness bug: InitializeProcessXLogLogicalInfo() is called in BaseInit() before ProcSignalInit(). A new process can therefore:
- Read the current
XLogLogicalInfostate. - (Emitter updates state and emits
PROCSIGNAL_BARRIER_UPDATE_XLOG_LOGICAL_INFO, skipping this process because its slot is empty.) - Register its procsignal slot with the new barrier generation, thereby claiming to be caught up when it actually holds stale logical-info state.
The fix mirrors what was already done for InitLocalDataChecksumState: move InitializeProcessXLogLogicalInfo to after ProcSignalInit, so that either the process reads fresh state (post-registration) or receives the barrier. Matthias's follow-up patch adds assertions/sanity code that the procsignal subsystem is live at the point these shared-state subsystems are initialized — a belt-and-suspenders check to prevent regressions of this ordering rule.
Backpatching Scope
Sawada confirmed the race exists v15..master. v14 has the code but no callers, so it's immune in practice. The patch should be backpatched to v15. The secondary InitializeProcessXLogLogicalInfo fix is master-only (67c20979c is master-only).
Reproduction Methodology
Alexander's contribution is notable: injecting a pg_usleep(10000) between memcpy(...pss_cancel_key...) and pg_atomic_write_u32(&slot->pss_pid, MyProcPid) in ProcSignalInit reliably turns a rare buildfarm flake into a deterministic failure, both confirming the diagnosis and providing a regression-test harness. Running this against the full suite revealed the v15–v18 exposure through DROP DATABASE / DROP TABLESPACE redo paths that use smgr PSBs — tests that had probably been flaky for years without anyone correlating them (he cites a 2025 report that was likely the same bug).
Architectural Lessons
- Lock-free occupancy checks against multi-field shared-memory state are subtle. The emitter's "is this slot live?" fast path (
pss_pid == 0) must be paired with initialization ordering in the registrant that makes the fast path monotonic with respect to any state the waiter will later inspect. - Barrier-generation snapshots at registration are dangerous if taken before publication. The newcomer effectively claims "I'm caught up to generation N" before it's reachable by emitters, creating a window in which an emission of N+1 can skip it while leaving the waiter to observe its stale generation.
- Signal-barrier expansion has latent cost. Every new barrier type (
UPDATE_XLOG_LOGICAL_INFO, checksum status, onlinewal_level) increases the rate at which latent bugs in this infrastructure become user-visible. The v15-era design was "safe by rarity"; that is no longer true. - Initialization order of shared-state subsystems vs. ProcSignalInit is now a load-bearing invariant. Any subsystem whose state is maintained via PSBs must initialize after
ProcSignalInit. This ought to be documented and possibly asserted.