Dynamic Logical Decoding Activation Without Server Restart
Core Problem
PostgreSQL's wal_level parameter controls how much information is written to WAL records. Setting it to logical enables logical decoding but increases WAL volume, potentially impacting performance. Prior to this work, changing wal_level required a server restart — a significant operational burden for users who start with wal_level=replica and later need logical replication.
The fundamental architectural challenge is that WAL records must contain logical decoding information before logical decoding can begin, but the system must ensure consistency: no logical decoder should ever encounter WAL records that lack the required information. This creates a multi-step activation protocol with careful synchronization requirements.
Evolution of Design Approaches
Approach 1: SQL Function API (Original PoC, Dec 2024)
The initial proposal introduced pg_activate_logical_decoding() and pg_deactivate_logical_decoding() functions. This used a tri-state status: DISABLED → XLOG_LOGICALINFO → READY. The middle state allowed WAL-logging of logical info while keeping logical decoding itself disabled until all in-progress transactions completed.
Rejected because: Cloud providers dislike multiple configuration methods; it creates confusion when SHOW wal_level displays 'replica' while logical info is being written to WAL.
Approach 2: Automatic Activation via Slot Creation (Jan 2025)
Bertrand Drouvot proposed hiding everything behind logical slot creation/deletion — creating the first logical slot activates logical decoding, dropping the last deactivates it.
Drawback identified: Users wanting logical decoding only on standbys would need to maintain a dummy logical slot on the primary, which was deemed awkward.
Approach 3: wal_level as PGC_SIGHUP (Jan-Feb 2025)
Making wal_level a SIGHUP-reloadable parameter. A background worker ("wal_level control worker") would handle the transition using ProcSignalBarrier for synchronization.
Rejected because: Supporting all transition combinations (especially minimal↔replica) adds enormous complexity for rare use cases. Transitions to/from 'minimal' require checkpoints, walsender termination, and archiver shutdown.
Approach 4: Two-GUC System (Apr 2025)
Proposed max_wal_level (POSTMASTER) and wal_level (SIGHUP). The former caps the maximum, the latter controls runtime level.
Rejected because: Introduces user confusion with two parameters controlling what used to be one concept.
Final Design: Automatic with effective_wal_level (Committed Dec 2025)
The committed approach automatically increases effective WAL level to 'logical' when the first logical slot is created and decreases it back to 'replica' when the last valid logical slot is dropped or invalidated. A read-only GUC effective_wal_level shows the runtime state.
Key Technical Architecture
Activation Protocol (EnsureLogicalDecodingEnabled)
- Acquire
LogicalDecodingControlLockin exclusive mode - Set
xlog_logical_info = truein shared memory - Write
XLOG_LOGICAL_DECODING_STATUS_CHANGEWAL record (for standby replication) - Set
logical_decoding_enabled = true - Release lock
- Emit
PROCSIGNAL_BARRIER_UPDATE_XLOG_LOGICAL_INFOand wait for all processes to acknowledge
The barrier ensures all processes update their local XLogLogicalInfo cache before any logical decoding begins.
Deactivation Protocol (DisableLogicalDecoding)
Deactivation is performed lazily by the checkpointer process. When the last logical slot is dropped:
RequestDisableLogicalDecoding()setspending_disable = trueand wakes the checkpointer- The checkpointer calls
DisableLogicalDecodingIfNecessary():- Acquires
LogicalDecodingControlLock - Verifies no valid logical slots exist
- Disables
logical_decoding_enabled - Writes STATUS_CHANGE WAL record
- Disables
xlog_logical_info - Releases lock
- Emits barrier signal (without waiting)
- Acquires
Why lazy? Deactivation during process exit (e.g., temporary slot cleanup) is problematic because the process holds interrupts and writing WAL + waiting for concurrent operations could cause deadlocks or hangs.
Transaction-Level Cache (XLogLogicalInfoXactCache)
A critical design decision: XLogLogicalInfoActive() caches its result per-transaction to prevent inconsistent WAL records within a single transaction. The cache is:
- Set on first call within a transaction
- Cleared at transaction end via
AtEOXact_LogicalDecoding() - Updated immediately for non-transactional operations
This prevents scenarios where half a WAL record contains logical info and the other half doesn't, which could confuse decoders.
End-of-Recovery Handling (Promotion)
During standby promotion, UpdateLogicalDecodingStatusEndOfRecovery() determines the new logical decoding state based on:
wal_levelsetting- Presence of valid logical slots
A key race condition exists: between when the startup process updates the status and when SharedRecoveryState becomes RECOVERY_STATE_DONE, backends might try to create/drop slots. The solution uses RecoveryInProgress() checks to block slot operations during this window.
Standby Behavior
- Standbys inherit logical decoding status from the primary via WAL replay of STATUS_CHANGE records
effective_wal_levelon standbys reflects the primary's state- Logical slots on standbys are invalidated when the primary disables logical decoding
- The slotsync worker is signaled via SIGUSR1 to the postmaster when logical decoding is enabled
Critical Race Conditions Identified and Resolved
-
Create/Drop Slot Race: Concurrent slot creation and dropping could leave logical decoding disabled with a slot present. Resolved by acquiring
LogicalDecodingControlLockbefore checking/modifying status. -
Promotion Window Race: Between
UpdateLogicalDecodingStatusEndOfRecovery()and recovery completion, backends could create slots that see stale status. Resolved by checkingRecoveryInProgress()and blocking status changes. -
Signal Barrier + Lock Deadlock: Holding
LogicalDecodingControlLockwhile waiting for signal barrier responses caused deadlocks. Resolved by releasing the lock beforeWaitForProcSignalBarrier(). -
XLogLogicalInfoActive() Inconsistency: Without transaction-level caching,
ExecuteTruncate()could Assert-fail becauseXLogLogicalInfoActive()changed mid-operation. Resolved withXLogLogicalInfoXactCache.
Post-Commit Controversy: Security/Privilege Concerns
Matthias van de Meent raised concerns that REPLICATION-privileged users can now effectively control wal_level by creating logical slots, increasing system-wide WAL overhead without DBA consent.
The consensus response (from Andres Freund, Amit Kapila, and Sawada):
- REPLICATION privilege already grants extraordinary power (reading all data, holding back horizons)
- Any user with DML privileges can already generate arbitrary WAL volume
- The incremental risk is minimal compared to existing REPLICATION capabilities
- If needed,
max_logical_replication_slotsor monitoring improvements are better solutions
The open item was proposed to be closed as "Non-bugs" in May 2026.
Performance Considerations
Benchmarks showed no noticeable regression from the patch itself (simple transaction throughput ~21,789 TPS with patch vs ~21,739 TPS without). The actual WAL volume impact comes from enabling logical-level WAL writing, which is the same overhead as wal_level=logical — the patch just makes the transition dynamic rather than requiring a restart.
Impact on Existing Tools
pg_upgrade: Relaxed check from requiringwal_level=logicaltowal_level >= replicapg_createsubscriber: Removed unnecessarywal_level=logicalcheckpg_controldata: Updated to showlogicalDecodingEnabledfield- Slot sync worker: Now launches based on
effective_wal_levelrather thanwal_level - Monitoring tools: Should check
effective_wal_levelinstead ofwal_level