2026-06-01 · claude-opus-4-6

PGConf.dev 2026 CSN Unconference Session: Technical Analysis

The Core Problem: Commit Sequence Numbers and the Visibility/Durability Tension

This thread captures the state-of-the-art thinking on one of PostgreSQL's most architecturally significant in-progress features: Commit Sequence Numbers (CSN). CSN is a long-discussed mechanism to replace or augment PostgreSQL's current snapshot-based visibility system (which relies on transaction ID arrays and the pg_xact CLOG) with a monotonically increasing sequence number assigned at commit time. The goal is to make snapshots cheaper (O(1) instead of O(active_transactions)), enable more efficient replication visibility semantics, and potentially fix consistency anomalies between primary and replica.

The fundamental tension identified in this session — and articulated as the primary source of complications — is the disconnect between visibility semantics and durability semantics in PostgreSQL. These two concepts are conflated in surprising ways due to the synchronous_commit GUC:

synchronous_commit = off (async commits): The transaction is immediately visible to other sessions, but its WAL may not yet be flushed to disk. If the system crashes, the commit may be lost.
synchronous_commit = on (sync commits): The transaction waits for local WAL flush before reporting success, but is visible immediately upon commit.
synchronous_commit = remote_write|remote_read|remote_apply (remote commits): The transaction waits for various levels of standby confirmation before reporting success.

The problem is that CSN must linearize all commits into a single total order. But the current system allows transactions with wildly different durability latencies to all become visible immediately. An async commit that takes microseconds and a remote_apply commit that takes milliseconds (or more) currently coexist in the visibility order without issue because PostgreSQL uses per-transaction visibility (CLOG lookup). Once you impose a global sequence number, you must decide: does the CSN represent the moment of visibility, or the moment of durability?

The Long Fork Problem

The thread references the Long Fork consistency phenomenon, documented in Jepsen's analysis of Amazon RDS for PostgreSQL 17.4. Long Fork occurs when a primary and replica expose different visibility orderings for committed transactions. In PostgreSQL's current architecture, this happens because:

On the primary, transactions become visible immediately upon commit (CLOG bit flip).
On a replica, transactions become visible when their commit WAL record is replayed.
WAL replay order on replicas is strictly LSN-ordered, but primary visibility order is not strictly LSN-ordered (due to concurrent commits with different WAL flush behaviors).

This means two transactions T1 and T2 might be visible in order (T1, T2) on the primary but (T2, T1) on a replica, creating a consistency anomaly.

Proposed Solutions

Solution 1: Commit Record LSN as CSN on Replicas (Consensus)

The session reached consensus that on replicas, the LSN of the commit WAL record is a natural CSN. Replay is already strictly LSN-ordered, so this preserves existing replica visibility semantics while enabling O(1) snapshot comparisons. This is architecturally clean and non-controversial.

Solution 2: LSN-based CSN on Primaries (Contentious)

Using the commit record's LSN as the CSN on the primary is more contentious because it interacts badly with synchronous_commit:

Problem (5a): An async-commit session can process thousands of transactions while a single remote_apply transaction is waiting for standby confirmation. If visibility is gated by CSN ordering, you must decide whether async commits with higher LSNs should be visible before or after a lower-LSN sync commit that hasn't yet achieved durability.

Solution 3: "Commit Visible" WAL Record (Novel Proposal)

A suggested innovation is to log a separate 'commit visible' WAL record that is written only after a transaction's COMMIT record has met its durability requirement. Key properties:

Multiple commits could share a single 'commit visible' record (analogous to commit_delay/commit_siblings batching WAL fsyncs), limiting WAL amplification.
This is conceptually similar to 2PC's COMMIT PREPARED, but cannot be rolled back — any committed-but-not-yet-visible transaction automatically becomes visible when recovery ends or a standby promotes.
The CSN would be derived from this record's LSN rather than the commit record's LSN.
This cleanly separates the durability event from the visibility event in WAL, enabling replicas to honor the same visibility ordering as the primary.

Tradeoff: This adds WAL volume and latency to the visibility path. The suggestion in point 2b (making async commits wait for durability of sync commits' CSN) would impose latency on async-commit sessions that read recently-modified data from sync-commit sessions.

Solution 4: In-Memory Counter CSN (Pragmatic Fallback)

An alternative approach (6a) uses a local in-memory counter to generate CSNs only at the moment of visibility, without logging them to WAL:

Pro: Preserves current primary visibility semantics exactly. Enables CSN-based O(1) snapshot benefits on the primary without WAL changes.
Con: Does NOT solve the Long Fork problem (primary and replica would still have potentially different visibility orderings). The CSN would not survive crash recovery (would need reconstruction from CLOG). May require more implementation effort than LSN-based approaches.

Key Architectural Insight: The Two-Phase Nature of Commit

The deepest insight from this session is that PostgreSQL's commit is already implicitly two-phase in the presence of synchronous_commit:

Phase 1 (Commit): Transaction writes commit record to WAL, flips CLOG bit → immediately visible.
Phase 2 (Durability confirmation): WAL is flushed locally and/or confirmed by standbys → client is notified of success.

Currently, visibility happens at Phase 1 regardless of the durability level. The "commit visible" record proposal would make this two-phase nature explicit in WAL, enabling replicas to reproduce the primary's visibility semantics faithfully.

Implications for the PostgreSQL Architecture

Snapshot scalability: CSN eliminates the need to copy arrays of active transaction IDs into snapshots, which is critical for high-connection-count workloads.
Replication consistency: LSN-based or "commit visible"-based CSN could enable replicas to guarantee the same visibility ordering as the primary, eliminating Long Fork.
WAL format changes: Any WAL-logged CSN approach requires WAL format changes, which are major cross-version compatibility concerns.
pg_xact interaction: CSN doesn't necessarily eliminate CLOG — it may run in parallel for backward compatibility, or CLOG could be derived from CSN during recovery.

Assessment

The community appears to be converging on a layered approach: use commit-record LSN as CSN on replicas (uncontroversial), while the primary CSN mechanism remains an open design question with multiple viable approaches of increasing ambition (in-memory counter → commit LSN → "commit visible" record). The thread demonstrates that the community is taking the Jepsen Long Fork findings seriously but has not committed to solving them as part of the initial CSN implementation.

PGConf.dev CSN unconference session: notes and follow-up discussion takeaways

Latest Update