2026-05-18 · claude-opus-4-6
Sequence Access Methods: A Deep Technical Analysis
The Core Problem
PostgreSQL's sequence infrastructure is monolithic: every sequence stores its state (last_value, is_called, log_cnt) in a single-page heap relation, computes values via a hardcoded local increment algorithm, and WAL-logs changes through a dedicated RMGR (RM_SEQ_ID). This architecture is fundamentally incompatible with distributed/active-active deployments where:
- Value uniqueness across nodes cannot be guaranteed without external coordination or schema modifications (e.g., different INCREMENT values per node, or replacing sequence defaults with custom functions).
- Schema transparency is broken — migrating an application from single-node to multi-node requires altering column defaults, which is unacceptable for large-scale deployments.
- Generated columns and SERIAL/IDENTITY are tightly coupled to the local computation, making it impossible to transparently substitute alternative value-generation strategies.
The historical context is significant: Postgres-XC (2009-2012) solved this by patching sequence.c to fetch values from a Global Transaction Manager. BDR/pgEdge explored similar approaches. The fundamental insight is that the computation method for sequence values should be pluggable, just as storage is pluggable via table AMs and indexing via index AMs.
Architectural Design
Separation of Concerns
The patch identifies three conceptual layers historically entangled in sequence.c:
- pg_sequence catalog properties — start, increment, min, max, cycle, cache. These are DDL-level properties shared across all AMs.
- Local sequence cache — per-backend caching for
lastval(), tracking the last sequence accessed.
- Sequence metadata/computation — the actual value generation, storage, and WAL-logging.
The key design decision is that (1) and (2) remain universal across all AMs, while (3) becomes the pluggable component. This means pg_sequence continues to provide "hints" to the AM about desired behavior, cross-property validation remains centralized, and the CACHE mechanism is handled by core code — not individual AMs.
Callback Architecture
The sequence AM handler (defined in sequenceam.h) provides callbacks for:
- init() — Create sequence storage (attributes, initial tuple), or do nothing for storage-less sequences
- get() — Retrieve current state (last_value, is_called) for dump/restore and
pg_sequence_last_value()
- set() — Force state (for
setval() and restore)
- next() — Compute the next value (the core computation callback)
- reset() — Handle ALTER SEQUENCE state resets
- get_table_am() — Declare which table AM (if any) the sequence uses for storage
This plugs into nextval_internal() via function pointers cached in the relcache, achieving transparency for:
nextval(), currval(), lastval(), setval()
- SERIAL and GENERATED ALWAYS AS IDENTITY columns
- pg_dump/pg_restore
Storage Model
A critical design choice: sequences remain relations (stored in pg_class), but the AM controls whether they have physical storage and what attributes that storage contains. The patch uses the same pattern as views — DefineRelation() creates the pg_class entry with no attributes, then AlterTableInternal() adds AM-specific columns afterward. This means:
- The "seqlocal" AM creates 3 columns (last_value, log_cnt, is_called) in a heap relation
- A snowflake AM might store timestamp + counter in an unlogged table
- An in-memory AM might store nothing at all
WAL and Persistence
Each AM controls its own WAL strategy:
- The in-core "seqlocal" AM retains the existing RM_SEQ_ID RMGR (renamed to RM_SEQ_LOCAL_ID)
- Custom AMs can use generic WAL (like the snowflake contrib module does)
- AMs can choose to skip WAL entirely for truly ephemeral sequences
- The
fill_seq_with_data() function handles UNLOGGED logic (writing to both main and init forks)
Relcache Integration
Sequences gain rd_seqamroutine (cached AM callbacks) and optionally rd_tableam in their RelationData. A new RELKIND_HAS_SEQUENCE_AM() macro mirrors RELKIND_HAS_TABLE_AM(). The relcache initialization path branches:
if (RELKIND_HAS_TABLE_AM(relkind))
RelationInitTableAccessMethod(rel);
else if (relkind == RELKIND_SEQUENCE)
RelationInitSequenceAccessMethod(rel);
Dump/Restore Strategy
The patch introduces:
default_sequence_access_method GUC (analogous to default_table_access_method)
--no-sequence-access-method flag for pg_dump/pg_restore
- SET commands emitted between CREATE/ALTER SEQUENCE in dump output
- Dependency on
pg_sequence_last_value() returning (last_value, is_called) tuple rather than scanning heap directly
Key Technical Debates
"Is this really an Access Method?"
Peter Eisentraut and others questioned whether "access method" is the right abstraction. The argument against: sequences don't "access" data in the traditional sense — they generate it. The argument for: the existing AM infrastructure (CREATE ACCESS METHOD, pg_am catalog, handler functions, GUCs) provides ready-made plumbing that would be wasteful to duplicate. Michael's position prevailed pragmatically: sequences are already relations, and the AM pattern fits.
Scope of Abstraction
Peter Eisentraut raised a fundamental design concern: the patch conflates high-level concerns (how to compute the next value) with low-level concerns (table AM, persistence, storage). He argued these should be separable layers — e.g., "local sequence of UUIDs" vs "global sequence of integers" should be independently combinable.
Michael acknowledged this but argued that:
- The current proposal solves real problems today
- Custom data types can be layered on top later via additional DDL clauses
- 64-bit integer values cover all practical distributed use cases he's encountered
Custom Data Types (int64 restriction)
The patch hardcodes all sequence values as int64. Peter advocated for UUID support and flexible types. Michael's counterarguments:
- The SQL specification requires sequences to work on integer values
- 64 bits provides sufficient space even for thousands of nodes (snowflake-style partitioning)
- Custom types would require different SQL functions (breaking transparency with
nextval())
- An informal poll showed ~50% of respondents considered 64b sufficient
This remains an unresolved design tension — the patch explicitly punts on it.
Performance Impact
Michael's benchmarks showed negligible overhead from function pointer indirection:
- N=5M nextval calls: HEAD=3230ms (646ns/value) vs patch=3315ms (663ns/value)
- ~2.6% overhead, within noise
Lepihov suggested a fast-path optimization: wire the default "seqlocal" handler OID directly to avoid catalog lookups in RelationInitSequenceAccessMethod. Michael acknowledged this as worth investigating but focused on correctness first.
Reloptions vs GUCs for AM Parameters
The patch deliberately omits reloptions for sequences (e.g., for configuring snowflake machine_id). Michael argued this can be added later and isn't a strict requirement — GUCs defined within the AM module suffice for initial use cases. Peter noted that the LCG use case (Matthias's PRNG example) would need modulus/multiplier/increment parameters, which the patch doesn't directly support.
The Snowflake Contrib Module
Patch 0007 provides a concrete implementation demonstrating the API:
- Stores state in an unlogged table (timestamps + machine IDs ensure cross-node uniqueness)
- Uses generic WAL for init fork logging
- Implements snowflake ID structure: timestamp bits + machine_id bits + local counter bits
- Machine ID controlled via GUC
- Demonstrates that
lastval(), currval(), nextval() work transparently
- Identity columns can feed from snowflake sequences without schema changes
Relationship to Logical Replication of Sequences
A parallel effort (led by Vignesh C and others) adds logical replication support for sequences. The sequence AM patch is designed to be orthogonal:
- Logical decoding depends on relfilenodes and page LSNs — the "seqlocal" AM preserves these
- Custom AMs that don't use local storage would need their own replication strategy
- The
get() callback provides a universal way to extract current state for initial sync
- Non-transactional sequence behavior is deliberately unchanged
File Organization (Final State)
src/backend/access/sequence/seqlocalam.c — "seqlocal" AM callbacks
src/backend/access/sequence/seqlocalxlog.c — WAL redo for seqlocal
src/backend/commands/sequence.c — Core sequence logic (30% smaller)
src/include/access/sequenceam.h — AM callback definitions
src/include/access/sequence_page.h — Helper macros for single-page AMs
contrib/snowflake/ — Example snowflake ID AM
Current Status and Outlook
After 3+ years and 28+ revisions, the patch remains in a "rebase loop" without consensus from core committers. Michael acknowledges "zero feedback from other core developers" and suspects "nobody really cares about this thread." The patch is technically mature — regression tests pass, dump/restore works, the API is well-exercised — but lacks the community momentum needed for committal. The unconference at pgconf.dev 2024 did not produce consensus, with discussion drifting toward logical replication of sequences instead.
The fundamental tension remains: this is infrastructure that primarily benefits distributed PostgreSQL deployments (pgEdge, Neon, custom active-active systems), while core PostgreSQL development energy has focused on built-in logical replication features that address a subset of the same problems differently.