Analysis: effective_wal_level Not Decreasing After REPACK (CONCURRENTLY)
Core Problem
This thread identifies a bug in the interaction between PostgreSQL's new dynamic WAL level toggling feature and the REPACK (CONCURRENTLY) command. The issue sits at the intersection of two recent commits:
- 67c2097 — Introduced dynamic toggling of logical decoding, allowing
effective_wal_levelto automatically decrease fromlogicaltoreplicawhen no logical replication slots exist. - 28d534e — Related infrastructure for the REPACK CONCURRENTLY feature which uses temporary logical replication slots internally.
The Architectural Problem
When REPACK (CONCURRENTLY) executes, it creates a temporary logical replication slot to perform online table reorganization using logical decoding. This slot creation triggers the system to elevate effective_wal_level to logical (as seen in the log message: "logical decoding is enabled upon creating a new logical replication slot").
However, when REPACK completes and drops the temporary slot during cleanup (repack_cleanup_logical_decoding), it does not call RequestDisableLogicalDecoding() to signal the checkpointer that logical decoding is no longer needed. The result is that effective_wal_level remains stuck at logical even though no logical slots exist anymore, requiring a full server restart to restore the lower WAL level.
This matters architecturally because:
- WAL volume: At
logicallevel, WAL records contain additional information needed for logical decoding (e.g., full tuple data for UPDATE/DELETE viawal_level = logical). This increases WAL volume, I/O, and replication bandwidth. - Performance: Unnecessary logical-level WAL generation imposes overhead on every write transaction until the server is restarted.
- Design contract violation: The entire purpose of the dynamic WAL level toggling feature (67c2097) is to automatically manage the WAL level based on actual slot presence. A transient internal slot should not permanently elevate the WAL level.
Proposed Solution
The patch is straightforward: add a call to RequestDisableLogicalDecoding() inside repack_cleanup_logical_decoding() immediately after the replication slot is dropped. This signals the checkpointer process to evaluate whether logical decoding can be disabled (i.e., check if any logical slots remain). If none exist, the checkpointer will lower effective_wal_level back to replica without requiring a restart.
This follows the same pattern that should be used by any code path that drops logical replication slots — it must coordinate with the dynamic toggling infrastructure by requesting a re-evaluation of the WAL level.
Key Design Considerations
-
Symmetry of enable/disable: The slot creation path already calls the enable side (automatically via
ReplicationSlotCreateor similar), but the REPACK cleanup path was missing the corresponding disable request. This is a classic symmetry bug in resource lifecycle management. -
Checkpointer-mediated approach: The disable request is intentionally asynchronous — it asks the checkpointer to handle the transition. This avoids holding up the REPACK command and ensures the WAL level change happens at a safe checkpoint boundary.
-
Broader audit needed: This bug raises the question of whether other code paths that create and drop temporary logical slots (e.g.,
pg_logical_emit_messagetesting, custom extensions using the logical decoding API) also need similar fixes.
Technical Context
The effective_wal_level GUC was introduced as a runtime-computed value distinct from the configured wal_level to support dynamic transitions. The infrastructure works as follows:
- When a logical slot is created →
effective_wal_levelis raised tological - When the last logical slot is dropped →
RequestDisableLogicalDecoding()should be called → checkpointer evaluates and potentially lowerseffective_wal_level - The actual WAL level change takes effect at the next checkpoint
The REPACK CONCURRENTLY feature uses logical decoding internally to capture changes made to a table while it's being reorganized (similar to how pg_reorg/pg_repack extensions work, but now built-in). It creates a slot, decodes changes, applies them to the new table copy, then drops the slot — a pattern that should be fully transparent to the WAL level management system.