[PATCH] Fix ProcKill lock-group vs procLatch recycle race

First seen: 2026-04-27 08:14:59+00:00 · Messages: 16 · Participants: 4

Latest Update

2026-06-01 · claude-opus-4-6

Monthly Summary: Fix ProcKill lock-group vs procLatch recycle race (May 2026)

Overview

This thread addresses a genuine concurrency bug in ProcKill() that has existed since PostgreSQL 9.6 (when lock groups were introduced for parallel query). The bug manifests as a PANIC: latch already owned by PID ... when a newly-started backend inherits a PGPROC slot whose procLatch.owner_pid has not yet been cleared. The month saw the fix go from initial submission through expert review, partial commit, and iterative test refinement.

The Bug

The race window exists between releasing leader_lwlock (after lock-group dissolution) and calling DisownLatch(). During this window, a concurrent last-member follower can observe the leader outside the lock group, push the leader's PGPROC onto the freelist, and a new backend can pop that slot and PANIC on OwnLatch() because owner_pid is still set.

Cross-fork confirmation (Greenplum, Yugabyte) establishes this is a production-observable bug, not merely theoretical — forks exercising lock groups more aggressively hit it regularly.

The Fix (Two-Part)

  1. Reorder: Move SwitchBackToLocalLatch() and DisownLatch() to before the lock-group dissolution block, so the latch is disowned before any other member can observe the PGPROC as available.
  2. Single decision point under leader_lwlock: Compute push_self and push_leader booleans atomically while holding the lock, then perform a single consolidated freelist update. This eliminates double-push and inconsistent-ownership hazards.

Key Developments This Month

Michael Paquier's Review (Critical Milestone)

Michael provided the first independent expert review, confirming the bug is real and the fix direction correct. His key feedback:

Test Infrastructure Evolution

Final Patchset Structure (vVL2)

Current Status

The fix direction is validated by a committer. One preparatory patch is committed. The remaining patches (test, refactoring, fix) await Michael's detailed review. The test has been reworked to address the multi-wakeup rejection. Backport scope for the refactoring patch remains an open question.

History (1 prior analysis)
2026-06-01 · claude-opus-4-6

What's new in this round

The most significant development: Michael Paquier has committed the actual bug fix (patches 0004 and 0005 from the prior series, now combined as commit 84b9d6bceab6). The core race condition fix — moving DisownLatch() earlier and the push_leader/push_self refactoring — is now landed and backpatched. This resolves the primary goal of the thread.

The remaining discussion is entirely about the test, which Michael has ultimately decided is an anti-pattern in its current form and probably not worth committing.

1. Michael commits and backpatches the fix (84b9d6bceab6)

Michael states he "spent a good portion of the day looking at 0002 and 0003, simplified quite a few things... and after convincing myself that the new logic is sound I have applied and backpatched both patches." Here "0002 and 0003" refer to the refactoring and the fix (renumbered after 0001 was already committed and 0002-multi-wakeup was rejected). This means both the code-clarity refactoring AND the DisownLatch() reorder are now in the tree across stable branches.

This is the resolution of the bug that has existed since 9.6.

2. Michael declares the test an anti-pattern

After reviewing the rebased test, Michael identifies a fundamental problem: the fix itself breaks the test's synchronization mechanism. After 84b9d6bceab6, backends detach their latches earlier in ProcKill, which means the injection_points wait mechanism (which relies on latches) cannot function at that point in the code path. The test was relying on statement_timeout as a timing crutch, which Michael rejects as non-deterministic and wasteful on fast machines.

3. Michael outlines what would be needed to make the test viable

He identifies three prerequisite patches, all quite invasive:

  1. Redesign injection_points wait mechanism to use shared-memory flags instead of latches (so it works in contexts where latches are already detached)
  2. Extend injection_points_attach() with an optional PID argument (eliminating the need for a separate prockill_attach_injection_wait_pid() function)
  3. Only then: add the test itself

Michael suggests this work would be appropriate for v20 development, not stable branches, and expresses doubt that this specific test case justifies the infrastructure investment: "I am not sure that the case of this thread is good enough to justify these changes, TBH."

4. Vlad accepts the test won't be committed in current form

Vlad agrees with Michael's arguments, noting the test was always primarily a proof-of-bug artifact. He acknowledges that its complexity exceeds the fix itself and that minimizing changes to injection_points infrastructure was a design goal that ultimately couldn't be achieved.

5. Vlad pushes back on single-wakeup semantics documentation

In a separate message, Vlad reiterates his concern that having multiple waiters on an injection point with only-one-awakened semantics is confusing. He requests either code comments documenting this limitation or an error when a second waiter attempts to attach to an already-waited-on point. This is a minor injection_points API design suggestion, not directly related to the ProcKill fix.

Summary of thread status

The bug is fixed and backpatched. The test is effectively shelved pending future injection_points infrastructure improvements (shmem-based waits) that may arrive in v20. The thread is substantively complete.