2026-05-18 · claude-opus-4-6

Fix the Race Condition for Updating Slot Minimum LSN

Core Problem

This thread addresses a race condition in PostgreSQL's replication slot WAL reservation mechanism that can lead to premature WAL removal and subsequent slot invalidation. The bug exists in the interaction between slot creation, slot advancement, and checkpoint processing.

Architectural Context

PostgreSQL's replication slots guarantee that WAL files required by a consumer (logical or physical replication) are retained. The system maintains a global "minimum LSN" (replication_slot_minimum_lsn) that represents the oldest WAL position any slot still needs. During checkpoints, WAL segments older than this minimum are eligible for removal.

The minimum LSN is computed by ReplicationSlotsComputeRequiredLSN(), which scans all slots and takes the minimum of their restart_lsn values. Separately, XLogSetReplicationSlotMinimumLSN() atomically updates the global minimum LSN value that checkpoints consult via XLogGetReplicationSlotMinimumLSN().

The Race Condition

The race involves three concurrent operations:

Backend A — Creating a new slot, specifically in ReplicationSlotReserveWal() where it determines and sets the slot's initial restart_lsn.
Backend B — Advancing an existing slot and calling ReplicationSlotsComputeRequiredLSN() followed by XLogSetReplicationSlotMinimumLSN().
Checkpoint process — Reading the global minimum LSN to determine which WAL to remove.

The dangerous interleaving:

Step	Action	State
1	Backend A creates slot `s`, determines its `restart_lsn = LSN_old` but hasn't written it yet	Global min LSN is stale
2	Backend B advances slot `advtest` to `LSN_new` (much newer), then calls `ReplicationSlotsComputeRequiredLSN()`	This scans slots; slot `s` either has InvalidXLogRecPtr (skipped) or hasn't been updated yet
3	Backend A writes `restart_lsn = LSN_old` to slot `s` and calls `ReplicationSlotsComputeRequiredLSN()` → sets global min to `LSN_old`	Correct momentarily
4	Backend B's `XLogSetReplicationSlotMinimumLSN()` executes after Backend A's, overwriting global min with `LSN_new`	Global min now too recent
5	Checkpoint reads global min = `LSN_new`, removes WAL segments before it	WAL needed by slot `s` at `LSN_old` is removed
6	Slot `s` is invalidated because its required WAL no longer exists	Data loss for consumer

The fundamental issue is a TOCTOU (time-of-check-time-of-use) problem: the computation of the minimum LSN and the update of the global minimum are not atomic with respect to slot restart_lsn modifications.

Proposed Solution

Approach: Serialization via ReplicationSlotControlLock

The patch applies the same pattern used in commit 2a5225b (which fixed an analogous race for slot_xmin updates):

Acquire ReplicationSlotControlLock in exclusive mode when updating slot.restart_lsn during WAL reservation in ReplicationSlotReserveWal().
Place XLogSetReplicationSlotMinimumLSN() under ReplicationSlotControlLock protection — specifically, the lock must be held from the point where ReplicationSlotsComputeRequiredLSN() scans slots through the point where XLogSetReplicationSlotMinimumLSN() writes the global minimum.

This serialization ensures that:

If Backend A is writing a new restart_lsn, Backend B's ReplicationSlotsComputeRequiredLSN() will either see the new value (if it runs after) or will not yet have released the lock for XLogSetReplicationSlotMinimumLSN() to overwrite a correct older minimum.
The global minimum LSN can never be advanced past a value that a concurrent slot creation is trying to reserve.

Files Modified

slot.c — Adding exclusive ReplicationSlotControlLock acquisition around restart_lsn updates in ReplicationSlotReserveWal().
slotsync.c — Similar protection for slot synchronization paths that update restart_lsn.

Design Tradeoffs

Lock contention: ReplicationSlotControlLock is already used for slot creation/deletion and xmin computation. Adding another exclusive acquisition during WAL reservation increases contention, but:

Slot creation is infrequent relative to normal operations
The critical section is short (just the LSN write + computation)
This mirrors the accepted approach from commit 2a5225b

Alternative not taken: One could imagine a version counter or retry loop, but the LWLock approach is simpler, proven (by the xmin precedent), and the performance impact is negligible given slot creation frequency.

Analysis of copy_replication_slot() Safety

Surya Poondla raises an important question about whether copy_replication_slot() in slotfuncs.c has the same vulnerability. The analysis concludes it does not, for two reasons:

No InvalidXLogRecPtr window: When copying a slot, create_logical_replication_slot() is called with a valid src_restart_lsn. Inside CreateInitDecodingContext(), because restart_lsn is already valid, ReplicationSlotReserveWal() is skipped entirely. The slot's restart_lsn is set directly to src_restart_lsn.
Monotonicity guarantee: The code errors out if copy_restart_lsn < src_restart_lsn, so the write never moves restart_lsn backward. Any concurrent scan will see a valid LSN that is at least as old as what the source slot had.

This is a valid analysis — the race requires a window where a slot exists but has InvalidXLogRecPtr as its restart_lsn, causing scanners to skip it. The copy path never creates such a window.

Relationship to Prior Work

This fix is directly related to:

Commit 2a5225b: Fixed the analogous race for effective_xmin/effective_catalog_xmin updates, establishing the pattern of using ReplicationSlotControlLock for serialization.
The referenced thread about invalidation of newly created slots: The broader investigation that uncovered this specific race condition.

The consistency of approach (same lock, same pattern) is architecturally sound — it creates a uniform contract that any modification to slot state that affects global minima must be serialized against the computation of those minima.

Fix the race condition for updating slot minimum LSN

Latest Update