[PATCH] Add pg_get_table_ddl() to reconstruct CREATE TABLE statements

First seen: 2026-06-03 12:58:52+00:00 · Messages: 1 · Participants: 1

Latest Update

2026-06-04 · claude-opus-4-6

Technical Analysis: pg_get_table_ddl() for CREATE TABLE Statement Reconstruction

Core Problem

PostgreSQL has incrementally built a family of DDL-reconstruction functions (pg_get_database_ddl, pg_get_role_ddl, pg_get_tablespace_ddl) that allow programmatic extraction of the SQL statements needed to recreate catalog objects. However, the most complex and commonly needed object type — tables — has lacked such a function. Users and tools (pgAdmin, pg_dump alternatives, migration frameworks) must either shell out to pg_dump or manually stitch together information from multiple catalog views (pg_class, pg_attribute, pg_constraint, pg_index, pg_attrdef, etc.) to reconstruct a complete CREATE TABLE statement.

This patch proposes pg_get_table_ddl(regclass, VARIADIC text[]) to fill that gap, returning a set of DDL statements (one per row) that fully reconstruct a table including all its sub-objects.

Architectural Significance

Why This Matters

  1. In-server DDL deparse completeness: PostgreSQL has been moving toward having authoritative DDL reconstruction in-core (ruleutils.c already handles views, indexes, constraints). A table-level function is the logical capstone — it's the most complex object with the most sub-object dependencies.

  2. Decoupling from pg_dump: Currently, pg_dump is the only reliable way to get reproducible DDL. But pg_dump is an external tool with its own dependency-ordering logic. An in-server function enables use cases like logical replication DDL replay, schema diffing within SQL, and extension-based migration tools without external process invocation.

  3. Round-trip fidelity: The patch explicitly targets round-trip correctness — the generated DDL, when executed, should produce a table whose catalog state is identical to the original (verified via EXCEPT diffs).

Design Decisions and Tradeoffs

1. Set-Returning Function (One Statement Per Row)

Rather than returning a single monolithic DDL string, the function returns multiple rows — one for the CREATE TABLE, then additional rows for ALTER TABLE, CREATE INDEX, CREATE RULE, CREATE STATISTICS, etc. This design:

2. Direct Reuse of C Helpers vs. New SQL Wrappers

The patch deliberately calls existing ruleutils.c internals (pg_get_indexdef_string, pg_get_constraintdef_command, etc.) directly from the new C function rather than exposing them as separate SQL-level functions (pg_get_index_ddl, pg_get_constraint_ddl).

Tradeoff: This keeps the SQL API surface small but couples the implementation tightly to internal helper signatures. If those helpers change, only one function needs updating — but it also means users who want just an index DDL must still use the older, less-consistent interfaces.

3. Option System (VARIADIC text[])

The use of VARIADIC text[] for options (pretty-printing, owner inclusion, tablespace inclusion, includes_* toggles) follows the pattern established by the existing _ddl family. This is extensible without catalog changes but loses type safety — invalid option names are runtime errors rather than compile-time errors.

4. Default-Omission Convention

Clauses are omitted when their value equals what the system would apply by default. For example:

This produces cleaner, more portable DDL but means the output is not a byte-for-byte representation of what the user originally typed.

5. Testing Strategy (pg_regress vs. TAP)

The author chose pg_regress (expected-output .out file comparison) over TAP tests, arguing that exact string matching is more rigorous for a deparse function. This is a defensible choice — TAP tests with regex matching could mask subtle deparse bugs — though it creates maintenance burden when output format changes.

Coverage Analysis

The patch handles a comprehensive matrix of table features:

Category Features Covered
Column-level typmod, COLLATE, STORAGE, COMPRESSION, GENERATED (stored/virtual), IDENTITY (always/by default), DEFAULT, NOT NULL, attoptions
Table-level UNLOGGED, INHERITS, PARTITION BY (range/list/hash), PARTITION OF...FOR VALUES, access method, WITH (reloptions), TABLESPACE, inline CHECK
Sub-objects Indexes, PK/UNIQUE/FK/EXCLUDE constraints, named NOT NULL, Rules, Extended Statistics, REPLICA IDENTITY, RLS enable/force, child-local DEFAULT overrides

Notable Gaps (Acknowledged)

Potential Review Concerns

  1. Virtual generated columns: These are a relatively new feature (PG 17+). The deparse logic for virtual columns interacts with expression deparse in complex ways, especially regarding column references within the generation expression.

  2. Partition handling complexity: Reconstructing PARTITION BY clauses and FOR VALUES specifications requires careful handling of partition bound datums, especially for multi-column range partitions and hash partitions with modulus/remainder.

  3. IDENTITY column sequence options: When identity columns have non-default sequence options, the patch must reconstruct the full GENERATED {ALWAYS|BY DEFAULT} AS IDENTITY (sequence_options) syntax, which requires querying pg_sequence and comparing against defaults.

  4. Inheritance interactions: Columns inherited from parent tables, constraint inheritance, and the interaction between INHERITS and PARTITION OF all add complexity to determining what should be explicitly stated vs. what is implied.

  5. Idempotency and IF NOT EXISTS: The patch description doesn't mention whether IF NOT EXISTS is an option — this matters for use cases like schema synchronization.

Relationship to Existing Infrastructure

The function builds on substantial existing infrastructure in ruleutils.c:

The new code likely lives in a new file or extends ruleutils.c significantly, orchestrating calls to these existing workers while handling the table-level framing that none of them address.