[PATCH] Fix REPACK decoding worker not cleaned up on FATAL exit

First seen: 2026-05-12 23:26:23+00:00 · Messages: 2 · Participants: 2

Latest Update

2026-05-14 · claude-opus-4-6

Technical Analysis: REPACK Decoding Worker Leak on FATAL Exit

The Core Problem

This thread addresses a resource leak bug in PostgreSQL's REPACK (CONCURRENTLY) infrastructure where a forced backend termination via pg_terminate_backend() leaves an orphaned background worker and its associated temporary replication slot in an active state indefinitely.

Why This Matters Architecturally

The bug exposes a fundamental tension in PostgreSQL's error handling and process lifecycle model:

  1. FATAL vs ERROR exit paths: PostgreSQL's ereport(FATAL) — triggered here by ProcDiePending when pg_terminate_backend() sends SIGTERM — takes a fundamentally different code path than normal error handling. Critically, FATAL exits bypass PG_FINALLY blocks. This is by design: PG_FINALLY is implemented via sigsetjmp/siglongjmp and only executes on ereport(ERROR) which longjmps back to the error handler. FATAL calls proc_exit() directly, never returning through the longjmp path. Any cleanup logic registered in PG_FINALLY is therefore silently skipped.

  2. Background worker lifecycle coupling: The REPACK decoding worker is a background worker (launched via RegisterDynamicBackgroundWorker()) whose lifecycle is logically coupled to its launching backend. However, background workers are independent processes managed by the postmaster. There is no built-in mechanism for the postmaster to automatically terminate a background worker when its launching backend exits — this coupling must be explicitly managed by the code.

  3. Replication slot pinning: The decoding worker holds a temporary replication slot (RS_TEMPORARY). While temporary slots are designed to be dropped when the owning process exits, the slot remains active as long as the worker process is alive. An active replication slot prevents WAL recycling, meaning this leak can cause unbounded WAL accumulation — a serious production concern that can fill disk and cause a cluster-wide outage.

Reproduction and Impact

The reproduction is straightforward:

This is a real operational hazard: DBAs routinely use pg_terminate_backend() to kill long-running sessions, and REPACK on large tables is exactly the kind of operation that tends to get killed.

Proposed Solution and Its Flaw

The v1 Patch Approach

Baji Shaik's patch registers an on_proc_exit() callback when the decoding worker is started. This callback calls TerminateBackgroundWorker() to signal the worker to shut down. The design deliberately avoids calling WaitLatch() or any blocking wait inside the callback, because the proc_exit callback runs late in process shutdown where many subsystems (including the latch infrastructure) may no longer be safe to use. Instead, it relies on the worker eventually exiting and the RS_TEMPORARY slot being auto-dropped.

This is architecturally sound in principle: on_proc_exit() callbacks are invoked during FATAL exits (since FATAL calls proc_exit()), unlike PG_FINALLY blocks. This is the standard PostgreSQL pattern for ensuring cleanup happens regardless of exit path.

The Segfault Problem

Sami Imseih's testing reveals the patch causes a segmentation fault (signal 11) during the termination sequence, which cascades into a full postmaster restart (all connections killed, crash recovery triggered). This is far worse than the original bug.

The segfault strongly suggests a use-after-free or null pointer dereference in the on_proc_exit callback. The most likely causes are:

  1. BackgroundWorkerHandle lifetime: The BackgroundWorkerHandle obtained from RegisterDynamicBackgroundWorker() may be allocated in a memory context that has already been destroyed by the time the on_proc_exit callback runs. During proc_exit(), memory contexts are torn down in a specific order, and if the handle was in a transaction-scoped or subtransaction-scoped context, it would be freed before late-stage on_proc_exit callbacks execute.

  2. Shared memory access after detach: TerminateBackgroundWorker() accesses shared memory structures (BackgroundWorkerSlot in the BackgroundWorkerArray). If shared memory has already been partially detached by the time the callback runs, this would segfault.

  3. Callback ordering: on_proc_exit callbacks run in LIFO order. If other cleanup callbacks have already invalidated state that the new callback depends on, the result is undefined behavior.

Sami's comment that "on_proc_exit() is not the right place" suggests the callback should likely be registered via before_shmem_exit() instead, which runs earlier in the shutdown sequence while shared memory is still fully accessible. Alternatively, the BackgroundWorkerHandle needs to be copied into TopMemoryContext to ensure it survives context teardown.

Unresolved Design Questions

  1. before_shmem_exit() vs on_proc_exit(): The fix likely needs before_shmem_exit() since TerminateBackgroundWorker() accesses shared memory. But before_shmem_exit callbacks must be careful not to do things that assume full transaction state.

  2. Race conditions: Even with a correct callback, there's a window between the worker being signaled and actually exiting. If the postmaster also restarts (e.g., due to the segfault in v1), the worker might be killed by SIGQUIT anyway, but under normal operation, the fire-and-forget approach means the slot remains briefly active after the launching backend exits.

  3. Should REPACK use bgw_notify_pid or WL_POSTMASTER_DEATH?: A more robust design might have the decoding worker itself monitor its launching backend's liveness (e.g., via the process latch or a shared memory flag) and self-terminate when the launcher disappears. This would be defense-in-depth rather than relying solely on the launcher's exit callback.

  4. Broader pattern concern: This bug pattern (FATAL bypassing PG_FINALLY cleanup of background workers) likely affects other features that launch background workers from user backends. The fix should ideally establish a reusable pattern.

Current State

The thread is in early review. The v1 patch correctly identifies the root cause and the right general approach (proc_exit callback), but the implementation has a critical bug causing server crashes. A v2 patch is needed that either uses before_shmem_exit() or ensures the BackgroundWorkerHandle is in a long-lived memory context. The discussion is still active with only the initial report and one review.