Startup process deadlock: WaitForProcSignalBarriers vs aux process

First seen: 2026-04-22 11:21:02+00:00 · Messages: 9 · Participants: 4

Latest Update

2026-05-06 · opus 4.7

Core Problem: A Race Between ProcSignalInit and EmitProcSignalBarrier

This thread uncovers a long-latent but recently-unmasked race condition in PostgreSQL's ProcSignalBarrier (PSB) machinery — the mechanism used to force all backends to acknowledge and act on a piece of globally-changed state (e.g., smgr invalidations, online wal_level changes, data checksum status changes, logical-decoding xlog info updates).

The Barrier Protocol, Briefly

ProcSignal slots in shared memory carry two relevant fields per process:

The emitter (EmitProcSignalBarrier) bumps a global generation counter, OR's the relevant barrier-type bit into each live slot, sends SIGUSR1 to each occupant, and then WaitForProcSignalBarrier spins until every slot's pss_barrierGeneration catches up to the emitted generation. The signal handler in the receiver calls HandleProcSignalBarrierInterrupt, which sets a latch; the receiver's CFI later runs ProcessProcSignalBarrier to dispatch each pending barrier bit and finally advance its pss_barrierGeneration.

The Race

The attacker window lives inside ProcSignalInit, which:

  1. Takes the slot's spinlock.
  2. Initializes pss_barrierGeneration to the current global generation (so the new process is treated as already caught up to anything emitted before it existed).
  3. Fills cancel-key fields, etc.
  4. Publishes pss_pid = MyProcPid via an atomic write (and releases the spinlock).

Meanwhile EmitProcSignalBarrier scans slots and — crucially — performs a lock-free pss_pid == 0 check to decide whether to skip a slot, before taking the spinlock to set the barrier bit. So the following interleaving, identified by Sawada, is possible:

  1. Newcomer: sets pss_barrierGeneration = global_gen under spinlock. pss_pid is still 0.
  2. Emitter: bumps global_gen to global_gen+1, scans slots, sees pss_pid == 0, skips this slot. It neither sets the barrier flag nor sends SIGUSR1.
  3. Newcomer: writes pss_pid = MyProcPid, becomes visible.
  4. Waiter (WaitForProcSignalBarrier): now sees a live slot with pss_barrierGeneration = global_gen < global_gen+1 and waits forever. The newcomer has no pending flag, no signal, no reason ever to update its generation.

The barrier-generation bookkeeping that was supposed to make late-joiners automatically "caught up" is defeated because the generation snapshot is taken before the PID publication, but the emitter orders the work the opposite way (check PID → then bump flags).

Why This Suddenly Matters

The PSB mechanism has existed since v14, but:

This is why Alexander's bisect lands on 67c20979c even though the underlying bug has been present since v15: the bug went from "hits on rare DDL" to "hits during every boot."

Matthias's Initial (Mis)diagnosis

Matthias originally hypothesized a different race: that AuxiliaryProcessMainCommon registers the ProcSignal slot before the aux process installs its SIGUSR1 handler, so signals delivered in that window could be lost. Andres correctly rebutted this: postmaster children are forked with all signals blocked (sigprocmask(SIG_SETMASK, &BlockSig, ...)), and the unblock happens only after signal handlers are installed. Thus a signal arriving in the "window" is merely pended by the kernel and delivered at unblock time — no loss. Matthias conceded this was a misidentification.

However, Andres's ancillary observation — that most signal-handler setup should be moved into AuxiliaryProcessMainCommon, with per-process exceptions (e.g., checkpointer's SIGTERM handling) configured after — remains an independent cleanup suggestion, not the fix.

Proposed Fix

Sawada's patch targets the actual race by reordering within ProcSignalInit and/or changing how the emitter detects occupancy so that the pairing (pss_pid set) → (pss_barrierGeneration initialized) is ordered such that any emitter observing pss_pid != 0 is guaranteed to also see — or to install — the barrier flags. The essential invariant needed is:

It must be impossible for an emitter to skip a slot (on the basis of pss_pid == 0) and still have the waiter later observe that slot as live with a stale pss_barrierGeneration.

This can be achieved either by:

  1. Publishing pss_pid before reading the global barrier generation into pss_barrierGeneration, and having the emitter take the per-slot lock before checking PID, so the emitter either (a) finds pss_pid == 0 and the newcomer will later read a generation ≥ this emission, or (b) finds pss_pid != 0 and flags + signals it.
  2. Having ProcSignalInit re-read the global generation after publishing PID, under a barrier that pairs with the emitter's scan.

The thread's patch takes the approach of re-synchronizing the generation capture with PID publication so that the emitter's lock-free PID check is safe.

Secondary Fix: InitializeProcessXLogLogicalInfo Ordering

Sawada identified a related but distinct correctness bug: InitializeProcessXLogLogicalInfo() is called in BaseInit() before ProcSignalInit(). A new process can therefore:

  1. Read the current XLogLogicalInfo state.
  2. (Emitter updates state and emits PROCSIGNAL_BARRIER_UPDATE_XLOG_LOGICAL_INFO, skipping this process because its slot is empty.)
  3. Register its procsignal slot with the new barrier generation, thereby claiming to be caught up when it actually holds stale logical-info state.

The fix mirrors what was already done for InitLocalDataChecksumState: move InitializeProcessXLogLogicalInfo to after ProcSignalInit, so that either the process reads fresh state (post-registration) or receives the barrier. Matthias's follow-up patch adds assertions/sanity code that the procsignal subsystem is live at the point these shared-state subsystems are initialized — a belt-and-suspenders check to prevent regressions of this ordering rule.

Backpatching Scope

Sawada confirmed the race exists v15..master. v14 has the code but no callers, so it's immune in practice. The patch should be backpatched to v15. The secondary InitializeProcessXLogLogicalInfo fix is master-only (67c20979c is master-only).

Reproduction Methodology

Alexander's contribution is notable: injecting a pg_usleep(10000) between memcpy(...pss_cancel_key...) and pg_atomic_write_u32(&slot->pss_pid, MyProcPid) in ProcSignalInit reliably turns a rare buildfarm flake into a deterministic failure, both confirming the diagnosis and providing a regression-test harness. Running this against the full suite revealed the v15–v18 exposure through DROP DATABASE / DROP TABLESPACE redo paths that use smgr PSBs — tests that had probably been flaky for years without anyone correlating them (he cites a 2025 report that was likely the same bug).

Architectural Lessons

  1. Lock-free occupancy checks against multi-field shared-memory state are subtle. The emitter's "is this slot live?" fast path (pss_pid == 0) must be paired with initialization ordering in the registrant that makes the fast path monotonic with respect to any state the waiter will later inspect.
  2. Barrier-generation snapshots at registration are dangerous if taken before publication. The newcomer effectively claims "I'm caught up to generation N" before it's reachable by emitters, creating a window in which an emission of N+1 can skip it while leaving the waiter to observe its stale generation.
  3. Signal-barrier expansion has latent cost. Every new barrier type (UPDATE_XLOG_LOGICAL_INFO, checksum status, online wal_level) increases the rate at which latent bugs in this infrastructure become user-visible. The v15-era design was "safe by rarity"; that is no longer true.
  4. Initialization order of shared-state subsystems vs. ProcSignalInit is now a load-bearing invariant. Any subsystem whose state is maintained via PSBs must initialize after ProcSignalInit. This ought to be documented and possibly asserted.