[PATCH] Fix LISTEN startup race with direct advancement

First seen: 2026-05-19 20:37:56+00:00 · Messages: 8 · Participants: 3

Latest Update

2026-06-01 · claude-opus-4-6

Monthly Summary: [PATCH] Fix LISTEN Startup Race with Direct Advancement — May 2026

Overview

This thread identifies and fixes a correctness bug in PostgreSQL's asynchronous notification system (async.c) introduced by commit 282b1cd. A race condition between LISTEN registration phases allows notifications to be permanently lost — a false-negative that violates the fundamental LISTEN/NOTIFY contract. The fix is minimal (removing a single early-continue check) and was posted, reviewed, revised, and reviewed again by a committer within the month.

The Bug

The race exists in the two-phase LISTEN commit path:

  1. PreCommit_Notify() — registers the backend in shared listener data and records the queue tail position.
  2. AtCommit_Notify() — finalizes by setting listening = true in the shared channel map.

Between these phases, a concurrent NOTIFY can commit. SignalBackends() skips entries with listening = false, so neither a signal is sent nor is direct advancement blocked. The combination of skipping the staged listener and direct advancement moving the queue pointer past the notification causes permanent notification loss.

This is distinct from the documented LISTEN startup race (a harmless false-positive where a notification arrives for already-observed work). The new bug is a false-negative — the application never learns about a state change.

The Fix

Remove the listening = false skip in SignalBackends():

-           if (!listeners[j].listening)
-               continue;       /* ignore not-yet-committed listeners */

This converts the dangerous false-negative into a benign false-positive (possible spurious wakeup during the tiny PreCommit→AtCommit window).

Key Developments

Current Status

The fix itself has committer agreement on correctness. Open questions remain about flag naming/semantics and whether the test cases will be included. No final commit has occurred yet.

History (1 prior analysis)
2026-06-01 · claude-opus-4-6

Incremental Update: Patch Committed with Flag Rename

Summary

The thread reached resolution. Joel proposed a concrete rename for the listening flag, Tom accepted it, and the fix has been committed to the PostgreSQL repository.

Flag Rename Resolution

In response to Tom's concern about the changed semantics of the listening flag, Joel proposed renaming it to removeOnAbort with negated meaning. This reframes the flag's purpose clearly: instead of "is this listener active?" (which no longer gates signaling), it becomes "should this entry be removed if the registering transaction aborts?" This accurately captures the flag's remaining semantic role after the fix — it marks entries that are provisionally registered during the PreCommit phase and need cleanup on abort.

Tom found this acceptable ("at least I haven't a better idea") and committed the patch with additional comment adjustments.

Test Cases Dropped by Mutual Agreement

Joel explicitly agreed with Tom's position on not committing the isolation tests, stating "feel free to remove them." He noted it would be nice to have infrastructure for disposable review-validation tests that cfbot can exercise but aren't permanently committed — but this is an aside, not a proposal for this patch.

Final Disposition

Tom committed the fix after "some more fiddling with the comments," indicating he made editorial adjustments to the surrounding documentation/comments in async.c beyond the mechanical rename. The core fix (removing the listening = false skip in SignalBackends()) plus the flag rename to removeOnAbort is now in the tree.