Monthly Summary: [PATCH] Fix LISTEN Startup Race with Direct Advancement — May 2026
Overview
This thread identifies and fixes a correctness bug in PostgreSQL's asynchronous notification system (async.c) introduced by commit 282b1cd. A race condition between LISTEN registration phases allows notifications to be permanently lost — a false-negative that violates the fundamental LISTEN/NOTIFY contract. The fix is minimal (removing a single early-continue check) and was posted, reviewed, revised, and reviewed again by a committer within the month.
The Bug
The race exists in the two-phase LISTEN commit path:
- PreCommit_Notify() — registers the backend in shared listener data and records the queue tail position.
- AtCommit_Notify() — finalizes by setting
listening = truein the shared channel map.
Between these phases, a concurrent NOTIFY can commit. SignalBackends() skips entries with listening = false, so neither a signal is sent nor is direct advancement blocked. The combination of skipping the staged listener and direct advancement moving the queue pointer past the notification causes permanent notification loss.
This is distinct from the documented LISTEN startup race (a harmless false-positive where a notification arrives for already-observed work). The new bug is a false-negative — the application never learns about a state change.
The Fix
Remove the listening = false skip in SignalBackends():
- if (!listeners[j].listening)
- continue; /* ignore not-yet-committed listeners */
This converts the dangerous false-negative into a benign false-positive (possible spurious wakeup during the tiny PreCommit→AtCommit window).
Key Developments
- v1 patch series posted as three patches: test for the bug (0001), test for documented race (0002), and the actual fix (0003).
- Arseniy Mukhin reviewed, confirmed reproducibility, and identified a second bad schedule — a delayed-delivery variant where direct advancement doesn't occur but the backend is never signaled. The same fix resolves both variants.
- v2 patch series posted to fix a packaging issue where 0003 accidentally duplicated test content from 0001.
- Tom Lane (committer) reviewed and agrees the fix is correct but raises two concerns:
- The
ListenerEntry.listeningflag's semantics are fundamentally changed — it no longer gates signaling. He suggests renaming but hasn't proposed a concrete alternative. - He is disinclined to commit the isolation test cases (0001, 0002), citing disproportionate CI cost relative to value.
- The
Current Status
The fix itself has committer agreement on correctness. Open questions remain about flag naming/semantics and whether the test cases will be included. No final commit has occurred yet.