Technical Analysis: REPACK Decoding Worker Leak on FATAL Exit
The Core Problem
This thread addresses a resource leak bug in PostgreSQL's REPACK (CONCURRENTLY) infrastructure where a forced backend termination via pg_terminate_backend() leaves an orphaned background worker and its associated temporary replication slot in an active state indefinitely.
Why This Matters Architecturally
The bug exposes a fundamental tension in PostgreSQL's error handling and process lifecycle model:
-
FATAL vs ERROR exit paths: PostgreSQL's
ereport(FATAL)— triggered here byProcDiePendingwhenpg_terminate_backend()sends SIGTERM — takes a fundamentally different code path than normal error handling. Critically, FATAL exits bypassPG_FINALLYblocks. This is by design:PG_FINALLYis implemented viasigsetjmp/siglongjmpand only executes onereport(ERROR)which longjmps back to the error handler.FATALcallsproc_exit()directly, never returning through the longjmp path. Any cleanup logic registered inPG_FINALLYis therefore silently skipped. -
Background worker lifecycle coupling: The REPACK decoding worker is a background worker (launched via
RegisterDynamicBackgroundWorker()) whose lifecycle is logically coupled to its launching backend. However, background workers are independent processes managed by the postmaster. There is no built-in mechanism for the postmaster to automatically terminate a background worker when its launching backend exits — this coupling must be explicitly managed by the code. -
Replication slot pinning: The decoding worker holds a temporary replication slot (
RS_TEMPORARY). While temporary slots are designed to be dropped when the owning process exits, the slot remains active as long as the worker process is alive. An active replication slot prevents WAL recycling, meaning this leak can cause unbounded WAL accumulation — a serious production concern that can fill disk and cause a cluster-wide outage.
Reproduction and Impact
The reproduction is straightforward:
- Session 1 starts
REPACK (CONCURRENTLY)on a large table (long-running operation) - Session 2 calls
pg_terminate_backend()on the REPACK session - The decoding worker continues running, holding its slot active
- The slot cannot be dropped externally because it is still in use by the running worker
- WAL accumulates indefinitely
This is a real operational hazard: DBAs routinely use pg_terminate_backend() to kill long-running sessions, and REPACK on large tables is exactly the kind of operation that tends to get killed.
Proposed Solution and Its Flaw
The v1 Patch Approach
Baji Shaik's patch registers an on_proc_exit() callback when the decoding worker is started. This callback calls TerminateBackgroundWorker() to signal the worker to shut down. The design deliberately avoids calling WaitLatch() or any blocking wait inside the callback, because the proc_exit callback runs late in process shutdown where many subsystems (including the latch infrastructure) may no longer be safe to use. Instead, it relies on the worker eventually exiting and the RS_TEMPORARY slot being auto-dropped.
This is architecturally sound in principle: on_proc_exit() callbacks are invoked during FATAL exits (since FATAL calls proc_exit()), unlike PG_FINALLY blocks. This is the standard PostgreSQL pattern for ensuring cleanup happens regardless of exit path.
The Segfault Problem
Sami Imseih's testing reveals the patch causes a segmentation fault (signal 11) during the termination sequence, which cascades into a full postmaster restart (all connections killed, crash recovery triggered). This is far worse than the original bug.
The segfault strongly suggests a use-after-free or null pointer dereference in the on_proc_exit callback. The most likely causes are:
-
BackgroundWorkerHandle lifetime: The
BackgroundWorkerHandleobtained fromRegisterDynamicBackgroundWorker()may be allocated in a memory context that has already been destroyed by the time theon_proc_exitcallback runs. Duringproc_exit(), memory contexts are torn down in a specific order, and if the handle was in a transaction-scoped or subtransaction-scoped context, it would be freed before late-stageon_proc_exitcallbacks execute. -
Shared memory access after detach:
TerminateBackgroundWorker()accesses shared memory structures (BackgroundWorkerSlotin theBackgroundWorkerArray). If shared memory has already been partially detached by the time the callback runs, this would segfault. -
Callback ordering:
on_proc_exitcallbacks run in LIFO order. If other cleanup callbacks have already invalidated state that the new callback depends on, the result is undefined behavior.
Sami's comment that "on_proc_exit() is not the right place" suggests the callback should likely be registered via before_shmem_exit() instead, which runs earlier in the shutdown sequence while shared memory is still fully accessible. Alternatively, the BackgroundWorkerHandle needs to be copied into TopMemoryContext to ensure it survives context teardown.
Unresolved Design Questions
-
before_shmem_exit()vson_proc_exit(): The fix likely needsbefore_shmem_exit()sinceTerminateBackgroundWorker()accesses shared memory. Butbefore_shmem_exitcallbacks must be careful not to do things that assume full transaction state. -
Race conditions: Even with a correct callback, there's a window between the worker being signaled and actually exiting. If the postmaster also restarts (e.g., due to the segfault in v1), the worker might be killed by SIGQUIT anyway, but under normal operation, the fire-and-forget approach means the slot remains briefly active after the launching backend exits.
-
Should REPACK use
bgw_notify_pidorWL_POSTMASTER_DEATH?: A more robust design might have the decoding worker itself monitor its launching backend's liveness (e.g., via the process latch or a shared memory flag) and self-terminate when the launcher disappears. This would be defense-in-depth rather than relying solely on the launcher's exit callback. -
Broader pattern concern: This bug pattern (FATAL bypassing PG_FINALLY cleanup of background workers) likely affects other features that launch background workers from user backends. The fix should ideally establish a reusable pattern.
Current State
The thread is in early review. The v1 patch correctly identifies the root cause and the right general approach (proc_exit callback), but the implementation has a critical bug causing server crashes. A v2 patch is needed that either uses before_shmem_exit() or ensures the BackgroundWorkerHandle is in a long-lived memory context. The discussion is still active with only the initial report and one review.