Monthly Summary: Release Replication Slot on Error in SQL-Callable Slot Functions (May 2026)
Problem Statement
PostgreSQL's SQL-callable replication slot functions (pg_replication_slot_advance, pg_logical_slot_get_changes, pg_logical_slot_peek_changes, pg_logical_slot_get_binary_changes, pg_logical_slot_peek_binary_changes, pg_create_logical_replication_slot, pg_copy_logical_replication_slot) can leak an acquired replication slot when an ERROR is thrown between ReplicationSlotAcquire() and ReplicationSlotRelease().
The bug manifests when the error is caught by a PL/pgSQL EXCEPTION block (which uses subtransactions). The session continues with MyReplicationSlot still pointing at the acquired slot, meaning:
- Assert builds: next slot-acquiring call crashes the backend (
Assert(MyReplicationSlot == NULL)at slot.c:638) - Release builds: the previous slot becomes permanently orphaned — its
active_pidremains set, blocking other sessions from acquiring it and pinningcatalog_xmin/restart_lsn, which prevents VACUUM and causes unbounded WAL retention
The root cause is architectural: replication slots have no ResourceOwner integration and no subtransaction abort callback. The SQL-callable wrappers assume straight-line execution to the release call — an assumption violated by exception handling.
Proposed Fix
The patch wraps the error-prone regions of all affected SQL-callable functions in PG_TRY { ... } PG_CATCH { ReplicationSlotRelease(); PG_RE_THROW(); } PG_END_TRY(), guaranteeing slot release on any error path before propagation to exception handlers.
Patch Evolution (v1 → v4)
| Version | Date | Key Changes |
|---|---|---|
| v1 | Pre-May | Initial PG_TRY/PG_CATCH wrapping for pg_replication_slot_advance |
| v2 | 2026-05-25 | Extended to all 7 affected functions; addressed Fujii's review (NULL guard, acquire inside PG_TRY, temporary slot drop in error path) |
| v3 | 2026-05-25 | Fixed invalid test for pg_copy_logical_replication_slot (uses nonexistent_plugin to trigger error after acquisition); added test coverage for pg_logical_slot_get_changes_guts() |
| v4 | 2026-05-26 | Cosmetic cleanup; patch format fixes |
Key Review Issues Resolved
- Temporary slot leak (Fujii):
ReplicationSlotRelease()doesn't auto-dropRS_TEMPORARYslots — error path must explicitly drop them - NULL guard (Fujii): PG_CATCH must check
MyReplicationSlot != NULLbefore calling release - Acquire placement (Fujii):
ReplicationSlotAcquire()itself can throw after setting the global, so it must be insidePG_TRY - Invalid test case (Shveta): Original test for
pg_copy_logical_replication_slotfailed before acquisition — replaced withnonexistent_plugintrigger
Current Status
The patch is converging. Shveta's final review of v3/v4 raised only cosmetic issues and explicitly stated no further technical objections. The patch covers all affected entry points in both slotfuncs.c and logicalfuncs.c. The fix is considered back-patchable to all supported branches.
Alternative Designs Discussed (Not Taken)
RegisterSubXactCallback: More systemic (one registration covers all functions) but larger change, riskier to back-patch- ResourceOwner integration: Most idiomatic PostgreSQL pattern but requires intrusive slot API changes
AtEOXact_*hook: Doesn't cleanly handle subtransaction case