avoid false dropped replication slot log messages

2026-05-11 · opus 4.7

Overview

This is a single-message bug-report/patch proposal from Lakshmi addressing a misleading log message in PostgreSQL's logical replication slot synchronization subsystem — specifically in drop_local_obsolete_slots(), which is responsible for reaping "synced" slots on a standby that no longer exist on the primary.

While small in scope, the bug touches on a subtle class of concurrency issues that pervade the slot-sync machinery: the gap between enumerating shared-memory slot entries and acting on them, during which another backend can legitimately repurpose an entry.

The Core Problem

Background: Synced Slots on Standbys

In PostgreSQL 17+, logical replication slots can be synchronized from a primary to a physical standby (the "slot sync" feature, pg_sync_replication_slots() / sync_replication_slots GUC). This lets a logical subscriber fail over to a standby without losing its replication position. Synced slots on the standby are marked with ReplicationSlot.synced = true and are managed by the slotsync worker or explicit sync calls.

When a synced slot is no longer present on the primary (or no longer meets sync criteria), the standby must drop its local copy. This is the job of drop_local_obsolete_slots() in src/backend/replication/logical/slotsync.c. Its broad shape:

Walk ReplicationSlotCtl->replication_slots[] under shared lock.
Build a candidate list of slot indices (or names) whose corresponding entries are synced slots absent from the remote list.
Release the shared lock and, for each candidate, acquire the slot and call ReplicationSlotDropAcquired().
Emit a LOG message announcing the drop.

The Race

Between steps 2 and 3, ReplicationSlotCtl is unlocked. In that window:

Another backend can call ReplicationSlotDrop() on the synced slot (e.g., during reconfiguration), freeing the shmem entry.
A user CREATE action (pg_create_logical_replication_slot, CREATE_REPLICATION_SLOT) can reuse the now-free ReplicationSlot array element, populating it with a brand-new, user-created slot — not synced, possibly in a different database, with a different name.

The existing code apparently has a guard: it re-checks synced after re-locking before calling the drop, so it correctly skips dropping the unrelated user slot. However, the surrounding ereport(LOG, ...) path still fires and references the slot name/dboid currently in the (now-unrelated) shmem entry. The result is a log line falsely claiming that a user's slot was "dropped" as part of obsolete-slot cleanup — when in reality nothing was dropped, and the named slot is alive and well.

Why This Matters

Operational confusion. DBAs monitoring logs for replication slot drops would see phantom drops of slots they know exist. This erodes trust in log-based monitoring/alerting.
Forensic noise. Post-incident analysis could chase a nonexistent drop event.
Correctness of user-facing messaging. Although the underlying action is correct (no wrongful drop occurs), PostgreSQL's logging contract should not misreport actions.

This is a logging bug, not a data-corruption bug — but it is the kind of cosmetic artifact that hints at a structural issue: message construction is decoupled from the guard that determines whether the action actually happened.

The Proposed Fix

Lakshmi proposes two coordinated changes:

1. Gate the LOG on the same `synced_slot` check

Move the ereport(LOG, ...) inside the same branch as the actual drop, so the LOG only fires when the shmem entry is still a synced slot at the moment of action. If the entry has been repurposed, the code skips both the drop and the log line. This restores the invariant that the log message corresponds to a real action taken.

2. Snapshot `NameStr(slot->data.name)` and `slot->data.database` before `ReplicationSlotDropAcquired()`

This is the subtler half. ReplicationSlotDropAcquired() ultimately clears MyReplicationSlot->in_use and releases the slot's identity. Referencing slot->data.name or slot->data.database after the drop is a use-after-free-ish pattern: the values may be zeroed, or (worse) the slot may be reallocated by another backend before the LOG call executes, making the message reflect whatever slot happened to land in that slot next.

By copying the name (a NameData, 64 bytes) and the database OID into local variables before the drop call, the subsequent LOG is guaranteed to describe the entity that was actually dropped.

This is a standard PostgreSQL idiom: any time you log about a shared-memory object whose identity you have just destroyed or released, you snapshot its identifying fields on the stack first. Similar patterns exist throughout slot.c, lock.c, and twophase.c.

Technical Insights and Design Considerations

The Locking Discipline of Slot Sync

The slot-sync path deliberately avoids holding ReplicationSlotControlLock across drops because ReplicationSlotDropAcquired() performs I/O (removing the on-disk slot directory) and can be expensive. Holding the control lock during that would serialize all slot operations cluster-wide. The cost is the two-phase "enumerate then act" pattern, which requires re-validation at action time — exactly the pattern that the bug exposes.

The fix does not change this discipline; it simply ensures the messaging respects it.

Why a Re-check Is Already Needed (and Why It's Correct)

The existing re-check presumably is something like: after acquiring the slot, verify MyReplicationSlot->data.synced and MyReplicationSlot->data.database == MyDatabaseId (or matches the snapshotted dbid) before proceeding with the drop. If a user slot has occupied the entry, synced will be false, and the code bails out — correctly. The bug is that the emit-LOG branch was apparently structured as an unconditional "we've finished the iteration" message rather than being tied to the action.

Alternative Designs Not Taken

One could imagine more invasive fixes:

Hold the control lock across the drop. Rejected implicitly — hurts concurrency badly.
Use slot "generation numbers" to detect reuse unambiguously. PostgreSQL's slot infrastructure does not track generations on ReplicationSlot entries; adding them would be a larger, semantically richer change. Overkill for a log-message bug.
Identify slots by name rather than shmem index when building the candidate list. This would require re-scanning the slot array on each action, adding O(N²) behavior in slot count. The current index-based approach with re-validation is the right tradeoff.

The minimal, surgical fix Lakshmi proposes is appropriate for the bug's scope.

Backpatch Considerations (not stated in the email)

Slot sync landed in PG17. This fix is a candidate for backpatch to 17 and 18 since it is purely a logging correctness fix with no ABI or behavioral change beyond message emission. The snapshot-before-drop change is defensively correct regardless of whether a reviewer can construct the exact race.

Participant Analysis

Only one participant — Lakshmi (lakshmin.jhs@gmail.com) — posted. No committer or senior reviewer has yet weighed in as of this snapshot. The report is well-framed: it identifies the race window precisely, distinguishes the (correct) skip behavior from the (incorrect) log emission, and proposes a fix that targets both the gating and the stale-pointer hazard. This is the posture of a contributor familiar with slot.c conventions, though without visible committer track record in this thread.

Reviewers likely to engage (based on slot-sync authorship history): Amit Kapila, Hou Zhijie, Shveta Malik, Bertrand Drouvot — the core group behind the slot sync feature. Their scrutiny would likely focus on: (a) is the re-validation under the correct lock level; (b) is the snapshotted database OID used anywhere that should instead use the post-drop state; (c) is there a similar bug in the slotsync worker's other log sites (e.g., update_local_synced_slot, synchronize_slots).

Assessment

A correct, minimal bugfix for a real — if cosmetic — issue in the slot synchronization subsystem. The two-part fix (gate the log, snapshot the identifying fields) reflects good defensive coding practice. The bug itself is a textbook example of the hazard of logging about shared-memory objects across lock-drop boundaries, and its existence suggests similar audits may be warranted elsewhere in slotsync.c.