synchronized_standby_slots behavior inconsistent with quorum-based synchronous replication

First seen: 2026-02-24 22:08:37+00:00 · Messages: 74 · Participants: 10

Latest Update

2026-05-14 · claude-opus-4-6

synchronized_standby_slots Behavior Inconsistent with Quorum-Based Synchronous Replication

Core Problem

PostgreSQL's synchronized_standby_slots GUC, introduced to support logical replication failover, enforces ALL-of-N semantics: every physical replication slot listed in the parameter must have caught up before a logical failover slot is permitted to proceed with decoding. This creates a fundamental availability mismatch with synchronous_standby_names, which supports ANY M-of-N (quorum) semantics.

The Architectural Inconsistency

In a typical 3-node HA deployment configured for quorum-based synchronous replication:

synchronous_standby_names = 'ANY 1 (standby1, standby2)'
synchronized_standby_slots = 'sb1_slot, sb2_slot'

If standby1 goes down, synchronous commits continue to succeed because standby2 satisfies the quorum. However, logical decoding blocks indefinitely in WaitForStandbyConfirmation(), waiting for sb1_slot to catch up — even though the transaction is already durably committed on a quorum of synchronous standbys. This defeats the availability guarantee the DBA intended by choosing quorum commit, and worse, can cause silent WAL accumulation on the primary leading to disk-full scenarios.

The root issue is that the two GUCs govern related but distinct concerns — commit durability (synchronous_standby_names) vs. logical slot advancement safety (synchronized_standby_slots) — yet have incompatible availability models.

Proposed Solution

Extending the GUC Syntax

The proposal extends synchronized_standby_slots to accept ANY M (slot1, slot2, ...) and FIRST N (slot1, slot2, ...) syntax, mirroring the grammar of synchronous_standby_names:

The two GUCs remain separate because the set of slots to synchronize can differ from the synchronous standby list (e.g., a DBA might want to ensure a geo-distant standby catches up before allowing logical consumers to read changes).

Key Technical Debates

1. Quorum Safety and Failover Correctness

Ashutosh Sharma raised a critical concern early in the thread: with ANY 1 (sync_standby1, sync_standby2), if sync_standby1 is ahead and confirms WAL that gets forwarded to the logical replica, and then sync_standby1 dies forcing failover to sync_standby2, the new primary could be at a lower LSN than the logical replica. The logical replication slot would be stale.

Amit Kapila countered that this is the same situation that exists for synchronous_standby_names with quorum commit — the failover orchestrator is responsible for selecting the most-caught-up standby. The documentation at logical-replication-failover.html provides steps to identify which replica is safe for subscriber switchover. This argument carried the day, with Shveta Malik and Satya concurring that failover correctness is the orchestrator's responsibility, not the GUC's.

2. Defaulting to synchronous_standby_names

An earlier thread (referenced by Amit Kapila) proposed having synchronized_standby_slots default to SAME_AS_SYNCREP_STANDBYS. This was conclusively rejected because:

Alexander Kukushkin and Ashutosh Sharma both identified these issues, leading to quick consensus that the two GUCs must remain independently configured.

3. Parser Reuse vs. Local Helper Function

A significant design disagreement emerged between Ashutosh Sharma and Hou Zhijie about how to distinguish plain lists from explicit FIRST N (...) syntax.

The problem: The existing syncrep_yyparse grammar treats a bare list slot1, slot2 as FIRST 1 (slot1, slot2). But for synchronized_standby_slots, a bare list must mean ALL-mode (wait for all). The parser output is ambiguous.

Ashutosh's approach: Keep the shared parser untouched and add a local helper IsPrioritySyncStandbySlotsSyntax() that inspects the raw string to detect explicit FIRST keyword presence. This keeps changes localized and avoids risk to synchronous_standby_names behavior.

Hou Zhijie's approach: Modify the shared syncrep grammar to emit a new method SYNC_REP_IMPLICIT (later SYNC_REP_DEFAULT) for bare lists, making the parser itself distinguish the three forms. This eliminates the need for redundant string-parsing logic and avoids bugs like the one Ajin Cherian found where slot names starting with "first" (e.g., firstsub1) were misidentified as the FIRST keyword.

Amit Kapila suggested splitting the patch to evaluate both approaches independently, which led to the final 3-patch series where the parser refactoring came first as 0001.

4. Slot State Tracking and Reporting

The patch introduces a SyncStandbySlotsState enum to classify slot conditions:

typedef enum {
    SS_SLOT_NOT_FOUND,          /* slot does not exist */
    SS_SLOT_LOGICAL,            /* slot is logical, not physical */
    SS_SLOT_INVALIDATED,        /* slot has been invalidated */
    SS_SLOT_INACTIVE_LAGGING,   /* inactive and behind */
    SS_SLOT_ACTIVE_LAGGING,     /* active but hasn't caught up */
} SyncStandbySlotsState;

A behavioral regression was caught by Shveta Malik: the initial patch treated all inactive slots as blocking, but HEAD code correctly allowed inactive slots that had already caught up (restart_lsn >= wait_for_lsn) to be counted as caught up. The fix split the inactive state into SS_SLOT_INACTIVE_LAGGING (blocking) and regular caught-up (non-blocking).

Reporting was moved to a dedicated ReportUnavailableSyncStandbySlots() function with actionable messages including LSN gap information. The log level for lagging slots was set to DEBUG1 rather than WARNING or ERROR, since during shutdown the walsender legitimately waits for standbys to catch up and WARNING messages would be noisy without being actionable.

5. Testing the SS_SLOT_LAGGING Path

Creating a deterministic test for the "active but lagging" slot state proved surprisingly difficult:

The winning approach, suggested by Hou Zhijie and implemented by Ajin Cherian, uses psql as a replication client via START_REPLICATION SLOT physical <lsn>. This acquires the slot (making it active) but unlike a real WAL receiver, psql doesn't send status feedback, so restart_lsn never advances — creating a deterministic active-but-lagging condition. This reduced test execution from 60-140 seconds to ~6 seconds.

Final Patch Structure

After extensive iteration, the patch was split into three parts at Amit Kapila's suggestion:

  1. 0001: Refactors the syncrep parser to introduce SYNC_REP_DEFAULT for bare standby lists, enabling callers to distinguish FIRST N (...), ANY N (...), and plain list forms
  2. 0002: Adds ANY N quorum semantics to synchronized_standby_slots
  3. 0003: Adds FIRST N and N (...) priority syntax support

This ordering ensures each patch is independently functional and reviewable.

Implications

This change is architecturally significant for PostgreSQL's logical replication failover story. Without quorum-aware synchronized_standby_slots, any deployment using quorum synchronous replication is forced to choose between:

The patch resolves this by allowing the logical slot advancement policy to match the commit durability policy, which is the only configuration that makes operational sense in quorum-based HA deployments.