Fix the race condition for updating slot minimum LSN

First seen: 2026-01-27 06:32:31+00:00 · Messages: 3 · Participants: 3

Latest Update

2026-05-18 · claude-opus-4-6

Fix the Race Condition for Updating Slot Minimum LSN

Core Problem

This thread addresses a race condition in PostgreSQL's replication slot WAL reservation mechanism that can lead to premature WAL removal and subsequent slot invalidation. The bug exists in the interaction between slot creation, slot advancement, and checkpoint processing.

Architectural Context

PostgreSQL's replication slots guarantee that WAL files required by a consumer (logical or physical replication) are retained. The system maintains a global "minimum LSN" (replication_slot_minimum_lsn) that represents the oldest WAL position any slot still needs. During checkpoints, WAL segments older than this minimum are eligible for removal.

The minimum LSN is computed by ReplicationSlotsComputeRequiredLSN(), which scans all slots and takes the minimum of their restart_lsn values. Separately, XLogSetReplicationSlotMinimumLSN() atomically updates the global minimum LSN value that checkpoints consult via XLogGetReplicationSlotMinimumLSN().

The Race Condition

The race involves three concurrent operations:

  1. Backend A — Creating a new slot, specifically in ReplicationSlotReserveWal() where it determines and sets the slot's initial restart_lsn.
  2. Backend B — Advancing an existing slot and calling ReplicationSlotsComputeRequiredLSN() followed by XLogSetReplicationSlotMinimumLSN().
  3. Checkpoint process — Reading the global minimum LSN to determine which WAL to remove.

The dangerous interleaving:

Step Action State
1 Backend A creates slot s, determines its restart_lsn = LSN_old but hasn't written it yet Global min LSN is stale
2 Backend B advances slot advtest to LSN_new (much newer), then calls ReplicationSlotsComputeRequiredLSN() This scans slots; slot s either has InvalidXLogRecPtr (skipped) or hasn't been updated yet
3 Backend A writes restart_lsn = LSN_old to slot s and calls ReplicationSlotsComputeRequiredLSN() → sets global min to LSN_old Correct momentarily
4 Backend B's XLogSetReplicationSlotMinimumLSN() executes after Backend A's, overwriting global min with LSN_new Global min now too recent
5 Checkpoint reads global min = LSN_new, removes WAL segments before it WAL needed by slot s at LSN_old is removed
6 Slot s is invalidated because its required WAL no longer exists Data loss for consumer

The fundamental issue is a TOCTOU (time-of-check-time-of-use) problem: the computation of the minimum LSN and the update of the global minimum are not atomic with respect to slot restart_lsn modifications.

Proposed Solution

Approach: Serialization via ReplicationSlotControlLock

The patch applies the same pattern used in commit 2a5225b (which fixed an analogous race for slot_xmin updates):

  1. Acquire ReplicationSlotControlLock in exclusive mode when updating slot.restart_lsn during WAL reservation in ReplicationSlotReserveWal().
  2. Place XLogSetReplicationSlotMinimumLSN() under ReplicationSlotControlLock protection — specifically, the lock must be held from the point where ReplicationSlotsComputeRequiredLSN() scans slots through the point where XLogSetReplicationSlotMinimumLSN() writes the global minimum.

This serialization ensures that:

Files Modified

Design Tradeoffs

Lock contention: ReplicationSlotControlLock is already used for slot creation/deletion and xmin computation. Adding another exclusive acquisition during WAL reservation increases contention, but:

Alternative not taken: One could imagine a version counter or retry loop, but the LWLock approach is simpler, proven (by the xmin precedent), and the performance impact is negligible given slot creation frequency.

Analysis of copy_replication_slot() Safety

Surya Poondla raises an important question about whether copy_replication_slot() in slotfuncs.c has the same vulnerability. The analysis concludes it does not, for two reasons:

  1. No InvalidXLogRecPtr window: When copying a slot, create_logical_replication_slot() is called with a valid src_restart_lsn. Inside CreateInitDecodingContext(), because restart_lsn is already valid, ReplicationSlotReserveWal() is skipped entirely. The slot's restart_lsn is set directly to src_restart_lsn.

  2. Monotonicity guarantee: The code errors out if copy_restart_lsn < src_restart_lsn, so the write never moves restart_lsn backward. Any concurrent scan will see a valid LSN that is at least as old as what the source slot had.

This is a valid analysis — the race requires a window where a slot exists but has InvalidXLogRecPtr as its restart_lsn, causing scanners to skip it. The copy path never creates such a window.

Relationship to Prior Work

This fix is directly related to:

The consistency of approach (same lock, same pattern) is architecturally sound — it creates a uniform contract that any modification to slot state that affects global minima must be serialized against the computation of those minima.