2026-06-01 · claude-opus-4-6
Monthly Summary: Prevent Repeated Deadlock-Check Signals in Standby Buffer Pin Waits (May 2026)
Overview
This thread addresses unnecessary repeated signaling in PostgreSQL's hot standby recovery conflict resolution for buffer pin waits. When the startup process needs exclusive buffer access during WAL replay (e.g., VACUUM cleanup), it signals conflicting backends via ResolveRecoveryConflictWithBufferPin(). Due to the caller loop in LockBufferForCleanup(), the deadlock-check signal fires repeatedly every deadlock_timeout interval (default 1s), creating a "signal storm" — demonstrated as 9 SIGUSR1 sends over 10 seconds in unpatched code.
The Fix
The patch mirrors an existing fix from the lock-conflict path (commit 8900b5a9d59a), which had the same problem and was resolved years ago. The buffer-pin path was left with an XXX comment asking whether the same treatment should apply.
The solution:
- After sending
RECOVERY_CONFLICT_BUFFERPIN_DEADLOCK, reset got_standby_deadlock_timeout = false
- Call
ProcWaitForSignal() to block the startup process until UnpinBuffer() wakes it
- This prevents the loop from re-arming the timer and re-sending the signal
v2 adds a defensive initialization of got_standby_delay_timeout = false before enabling STANDBY_TIMEOUT, for symmetry and future-proofing.
Classification Resolution
The patch is classified as a performance improvement (not a bug), targeting PostgreSQL v20 with no backpatching:
- The repeated signaling doesn't cause incorrect behavior — conflicts still resolve properly
- It only creates unnecessary overhead (extra signal handler invocations, wake-ups)
Relationship to Vitaly's Bug Report
Álvaro Herrera raised a question about whether this overlaps with a separate bug report by Vitaly (postgrespro.ru) affecting pg15+. The patch author clarified these are independent fixes targeting opposite failure modes:
- Vitaly's bug: Missed wake-up —
deadlock_timeout never fires due to spurious SIGALRM interaction
- This patch: Repeated wake-up —
deadlock_timeout fires correctly but re-arms in a loop
The ProcWaitForSignal() added by this patch sits inside the deadlock branch that Vitaly's scenario never reaches, so the two fixes don't conflict.
Test Results
Validated using src/test/recovery/t/031_recovery_conflict.pl with strace:
- Unpatched: 9 SIGUSR1 signals over 10 seconds
- Patched: 1 SIGUSR1 signal over 10 seconds
2026-06-01 · claude-opus-4-6
Incremental Update: Regression in log_recovery_conflict_waits Behavior Identified
New Technical Issue: Lost Early Conflict Logging
Two new participants identify a regression in the v2 patch: the fix for repeated signaling inadvertently suppresses the early log_recovery_conflict_waits message that should appear at deadlock_timeout.
The Problem:
With log_recovery_conflict_waits = on, the expected behavior is:
- At
deadlock_timeout (e.g., 100ms): log "recovery still waiting"
- At
max_standby_streaming_delay (e.g., 5s): cancel query and log "recovery finished waiting"
The v2 patch's second ProcWaitForSignal() blocks the startup process inside ResolveRecoveryConflictWithBufferPin() after sending the deadlock-check signal. This means control never returns to LockBufferForCleanup() at the expected time, so the caller cannot emit the early conflict log message. The "still waiting" message is delayed until ~5s (when the conflict actually resolves), making it useless for monitoring.
Demonstrated behavior difference:
| Scenario |
Master |
v2 Patch |
| "still waiting" log |
~101ms |
~5001ms |
| "finished waiting" log |
~5001ms |
~5001ms |
The logging_conflict Pattern from Lock Path
Zsolt Parragi first raised this concern, and Ilmar Yunusov confirmed it with a concrete reproduction. Both point to ResolveRecoveryConflictWithLock() as the model solution — that function has a logging_conflict argument that controls whether to skip the second wait:
- On the first call (before logging), the function returns after sending the deadlock-check signal so the caller can emit the log message
- On the second call (after logging), it enters the extended
ProcWaitForSignal() to avoid repeated signals
The patch needs to implement an equivalent mechanism for the buffer-pin path.
Paquier Confirms No-Backpatch Position
Michael Paquier reaffirms this is not backpatch-worthy, stating it "does not fix a problem, just improves a bit what we have."
Patch Status
The CF entry has been moved to Waiting on Author — the author needs to produce a v3 that preserves the early logging behavior while still preventing repeated signals.