Technical Analysis: Preventing Repeated Deadlock-Check Signals in Standby Buffer Pin Waits
Core Problem
In PostgreSQL's hot standby recovery conflict resolution, there is a behavioral asymmetry between two conflict paths that share similar architecture. When the startup process on a standby needs to replay WAL that requires exclusive access to a buffer page (e.g., during VACUUM cleanup), it must resolve conflicts with backends that hold buffer pins on that page. This is handled by ResolveRecoveryConflictWithBufferPin().
The Signal Storm Problem
The control flow works as follows:
LockBufferForCleanup()callsResolveRecoveryConflictWithBufferPin()- A
deadlock_timeouttimer is armed - When the timeout fires,
got_standby_deadlock_timeoutis set to true - The function sends
PROCSIG_RECOVERY_CONFLICT_BUFFERPIN(the deadlock-check signal) to the conflicting backend - The function returns to
LockBufferForCleanup() - The caller loops back, re-enters the function, arms another
deadlock_timeout - The signal fires again after the next interval
This creates a pattern where the startup process repeatedly signals the same backend every deadlock_timeout interval (default 1s), even though the backend has already been notified. The patch author demonstrated via strace that this results in 9 SIGUSR1 broadcasts over a 10-second window on unpatched code.
Historical Context and the XXX Comment
The lock-conflict path (ResolveRecoveryConflictWithLock) had the identical problem and was fixed in commit 8900b5a9d59a. The fix there was to add a second ProcWaitForSignal() call after sending the deadlock-check signal, causing the startup process to sleep until the actual conflict resolves (i.e., the backend unpins the buffer) rather than returning to the caller's loop.
The buffer-pin path was knowingly left unfixed, marked with an XXX comment asking whether the same fix should be applied. This patch finally addresses that TODO.
Proposed Solution
The fix is architecturally simple and mirrors the established pattern from the lock-conflict path:
- After sending the
RECOVERY_CONFLICT_BUFFERPIN_DEADLOCKsignal, resetgot_standby_deadlock_timeout = false - Call
ProcWaitForSignal()to block the startup process - The startup process now waits until
UnpinBuffer()wakes it (viaProcSendSignal), rather than looping back to the caller
This ensures only one deadlock-check signal is sent per conflict episode, reducing unnecessary signal traffic and making the behavior consistent with the lock-conflict resolution path.
v2 Additions
The second version also initializes got_standby_delay_timeout = false before enabling the STANDBY_TIMEOUT timer. This is a defensive coding improvement for symmetry with how got_standby_deadlock_timeout is already initialized. While likely harmless in current code (the timeout handler sets it to true, and the flag is only checked after the wait), it prevents potential issues if code paths change in the future.
Architectural Significance
Why This Matters Beyond "Just Signals"
-
Signal coalescing risk: SIGUSR1 is used as a multiplexed notification mechanism in PostgreSQL. Excessive signals create unnecessary wake-ups and processing overhead in the target backend's signal handler, which must check multiple flag variables.
-
Recovery conflict escalation: After
max_standby_streaming_delay, the standby will cancel the conflicting query. The repeated signaling doesn't change this behavior but adds useless overhead during the wait window. -
Consistency principle: Having two parallel code paths (lock conflicts vs. buffer-pin conflicts) with different behaviors for the same logical pattern creates maintenance burden and confusion for developers.
Classification Debate: Bug vs. Improvement
The thread contains an interesting classification discussion:
- Fujii and Paquier classify this as an improvement (not a bug), meaning no backpatching and targeting v20
- The patch author agrees, updating the CF entry to "Performance"
- Álvaro Herrera raises a late question about whether this is related to a separate thread by Vitaly (at postgrespro.ru) that claims this IS a bug going back to pg15
This classification matters operationally: if it's a bug, it should be backpatched to supported versions; if it's an improvement, it waits for the next major release. The repeated signaling doesn't cause incorrect behavior (the conflict still resolves properly), but it does cause unnecessary work. The "bug" interpretation likely relates to whether the repeated signals could cause visible user-facing issues (e.g., log spam, performance degradation under heavy conflict scenarios).
Testing Approach
The patch author validated using the existing recovery conflict test infrastructure (src/test/recovery/t/031_recovery_conflict.pl) with a specific scenario:
- Primary:
VACUUM FREEZEon a table - Standby: An aborted INSERT + open cursor holding a buffer pin
- Measurement:
stracecounting SIGUSR1 sends over a 10-second window - Result: 9 signals (unpatched) → 1 signal (patched)