2026-05-11 · opus 4.7

Replication Slot Leak on Error in SQL-Callable Slot Functions

The Core Problem

PostgreSQL tracks the currently-acquired replication slot via a process-global variable MyReplicationSlot in src/backend/replication/slot.c. The invariant is strict: before any slot acquisition, ReplicationSlotAcquire() asserts MyReplicationSlot == NULL (slot.c:638), and callers are responsible for pairing every acquisition with ReplicationSlotRelease().

For the replication protocol walsender path this invariant is trivially maintained because a walsender connection is tightly scoped and its error handler (WalSndErrorCleanup) knows to release the slot. However, the SQL-callable surface — pg_replication_slot_advance(), pg_logical_slot_get_changes(), pg_logical_slot_peek_changes(), pg_replication_slot_advance(), pg_create_logical_replication_slot(), pg_copy_logical_replication_slot(), etc. — runs inside a regular backend executing a normal SQL statement. The cleanup for those functions is written as straight-line code: acquire, do work, release. If an ERROR is thrown between acquire and release, the top-level AbortTransaction path for regular backends does not reset MyReplicationSlot through a registered callback — the slot functions rely on the explicit release call being reached.

In an ordinary top-level transaction this bug is masked: the backend terminates the transaction, ProcKill/ReplicationSlotCleanup on backend exit reclaims things, and a reconnection starts fresh. The bug becomes visible and dangerous when the error is trapped by a PL/pgSQL EXCEPTION block, which implements savepoints via subtransactions. The subtransaction is rolled back, control returns to the PL/pgSQL frame, and the session continues — but MyReplicationSlot still points at the slot that was being operated on. From the slot's perspective, active_pid is still set to this backend.

Consequences:

Assert builds: the next slot-acquiring call in the session trips Assert(MyReplicationSlot == NULL) and crashes the backend.
Release builds: the assert is compiled out. ReplicationSlotAcquire silently overwrites the stale pointer. The previous slot is now orphaned — its active_pid is still this backend's PID, so no other session can acquire it. The slot holds back catalog_xmin/restart_lsn, which blocks VACUUM from removing dead tuples in catalog relations and prevents WAL segment recycling. Left unattended this is an availability/disk-space incident.

This is effectively a resource-leak-on-error bug at the boundary between the replication slot subsystem and the normal executor's error handling model.

Why the Architecture Permits This

There are two established patterns in PostgreSQL for guaranteeing cleanup across an ERROR longjmp:

PG_ENSURE_ERROR_CLEANUP / on_shmem_exit / before_shmem_exit callbacks — used for process-lifetime resources. ReplicationSlotCleanup() is wired here, which is why backend exit is safe.
Resource owners and PG_TRY/PG_CATCH — used for statement/transaction-scoped resources (buffer pins, relcache refs, tuple descriptors, etc.).

Replication slots historically sit awkwardly between these. They are conceptually session-scoped for walsenders (long-lived, one slot per connection) but become statement-scoped when exposed as SQL functions. There is no ResourceOwner integration for slot acquisition, and no subtransaction abort callback (RegisterSubXactCallback) that clears MyReplicationSlot. The SQL-callable wrappers were written assuming the body could not throw, or that any throw would terminate the session — an assumption that PL/pgSQL exception blocks violate.

The Proposed Fix

Satya's patch takes the minimally invasive approach: wrap the error-prone region of each affected SQL-callable function in PG_TRY { ... } PG_CATCH { ReplicationSlotRelease(); PG_RE_THROW(); } PG_END_TRY(). This guarantees that on any error path the global is cleared and the slot's active_pid is reset before the error propagates to the PL/pgSQL exception handler.

Implications and tradeoffs of this approach:

Scope discipline: Every SQL-callable entry point that touches ReplicationSlotAcquire must be audited and wrapped. Missing one reintroduces the bug. Files typically involved: src/backend/replication/slotfuncs.c (the SQL wrappers), and the logical decoding entry points in src/backend/replication/logical/logicalfuncs.c.
Correctness under nested errors: PG_CATCH runs in the error context; calling ReplicationSlotRelease there is safe because it only touches shared-memory state under the ReplicationSlotControlLock and does not allocate or do I/O that could itself throw in a way that would corrupt state. The PG_RE_THROW() preserves the original error.
Alternative designs not taken:
1. A subtransaction abort callback (RegisterSubXactCallback) that calls ReplicationSlotRelease() if MyReplicationSlot != NULL. This is more systemic — one registration covers all present and future SQL-callable slot functions — and mirrors how other subsystems handle this (e.g., AtEOSubXact_* routines).
2. Resource owner integration: make slot acquisition register with CurrentResourceOwner so that ResourceOwnerRelease during abort frees it. This is the most "PostgreSQL-idiomatic" fix but requires more intrusive changes to the slot API.
3. A top-level AbortTransaction hook (like AtEOXact_*). This would fix the top-level case but not necessarily the subtransaction case cleanly unless paired with the subxact variant.

The PG_TRY approach is pragmatic and back-patchable; a callback-based approach would be cleaner architecturally but is a larger change and riskier to back-patch. Reviewers on pgsql-hackers have historically preferred the callback approach for similar leak classes (cf. the pattern used for LockReleaseCurrentOwner and the AtEOXact callbacks), so it is likely this patch will see pushback toward that direction.

Reproducer Analysis

The reproducer is elegant because it exploits pg_replication_slot_advance on a freshly-created slot where the requested LSN '0/1' is behind the slot's current confirmed_flush_lsn, which raises an ERROR after the slot has been acquired. The PL/pgSQL EXCEPTION WHEN others catches it, leaving MyReplicationSlot dangling. The next pg_logical_slot_get_changes call then trips the assert. Any function that errors between ReplicationSlotAcquire and ReplicationSlotRelease would exhibit the same issue — this is a class of bug, not a single site.

Severity and Back-Patch Considerations

This is a latent data-availability bug in release builds: an orphaned logical slot will pin catalog_xmin indefinitely, causing catalog bloat and potentially unbounded WAL retention (for physical slots or logical slots with restart_lsn). Detection requires noticing that pg_replication_slots.active_pid points at a live backend that isn't actually using the slot, or observing disk-space growth. The fix is a strong candidate for back-patching to all supported branches since the defect has existed since SQL-callable slot functions were introduced.