Technical Analysis: pg_get_table_ddl() for CREATE TABLE Statement Reconstruction
Core Problem
PostgreSQL has incrementally built a family of DDL-reconstruction functions (pg_get_database_ddl, pg_get_role_ddl, pg_get_tablespace_ddl) that allow programmatic extraction of the SQL statements needed to recreate catalog objects. However, the most complex and commonly needed object type — tables — has lacked such a function. Users and tools (pgAdmin, pg_dump alternatives, migration frameworks) must either shell out to pg_dump or manually stitch together information from multiple catalog views (pg_class, pg_attribute, pg_constraint, pg_index, pg_attrdef, etc.) to reconstruct a complete CREATE TABLE statement.
This patch proposes pg_get_table_ddl(regclass, VARIADIC text[]) to fill that gap, returning a set of DDL statements (one per row) that fully reconstruct a table including all its sub-objects.
Architectural Significance
Why This Matters
-
In-server DDL deparse completeness: PostgreSQL has been moving toward having authoritative DDL reconstruction in-core (ruleutils.c already handles views, indexes, constraints). A table-level function is the logical capstone — it's the most complex object with the most sub-object dependencies.
-
Decoupling from pg_dump: Currently,
pg_dumpis the only reliable way to get reproducible DDL. Butpg_dumpis an external tool with its own dependency-ordering logic. An in-server function enables use cases like logical replication DDL replay, schema diffing within SQL, and extension-based migration tools without external process invocation. -
Round-trip fidelity: The patch explicitly targets round-trip correctness — the generated DDL, when executed, should produce a table whose catalog state is identical to the original (verified via
EXCEPTdiffs).
Design Decisions and Tradeoffs
1. Set-Returning Function (One Statement Per Row)
Rather than returning a single monolithic DDL string, the function returns multiple rows — one for the CREATE TABLE, then additional rows for ALTER TABLE, CREATE INDEX, CREATE RULE, CREATE STATISTICS, etc. This design:
- Allows callers to filter/reorder sub-objects
- Matches the existing
_ddlfamily convention - Avoids the question of statement ordering within a single text blob
2. Direct Reuse of C Helpers vs. New SQL Wrappers
The patch deliberately calls existing ruleutils.c internals (pg_get_indexdef_string, pg_get_constraintdef_command, etc.) directly from the new C function rather than exposing them as separate SQL-level functions (pg_get_index_ddl, pg_get_constraint_ddl).
Tradeoff: This keeps the SQL API surface small but couples the implementation tightly to internal helper signatures. If those helpers change, only one function needs updating — but it also means users who want just an index DDL must still use the older, less-consistent interfaces.
3. Option System (VARIADIC text[])
The use of VARIADIC text[] for options (pretty-printing, owner inclusion, tablespace inclusion, includes_* toggles) follows the pattern established by the existing _ddl family. This is extensible without catalog changes but loses type safety — invalid option names are runtime errors rather than compile-time errors.
4. Default-Omission Convention
Clauses are omitted when their value equals what the system would apply by default. For example:
STORAGEis omitted if it matches the type's default storage strategyCOMPRESSIONis omitted if it matches the defaultTABLESPACEis omitted if it's the default tablespace
This produces cleaner, more portable DDL but means the output is not a byte-for-byte representation of what the user originally typed.
5. Testing Strategy (pg_regress vs. TAP)
The author chose pg_regress (expected-output .out file comparison) over TAP tests, arguing that exact string matching is more rigorous for a deparse function. This is a defensible choice — TAP tests with regex matching could mask subtle deparse bugs — though it creates maintenance burden when output format changes.
Coverage Analysis
The patch handles a comprehensive matrix of table features:
| Category | Features Covered |
|---|---|
| Column-level | typmod, COLLATE, STORAGE, COMPRESSION, GENERATED (stored/virtual), IDENTITY (always/by default), DEFAULT, NOT NULL, attoptions |
| Table-level | UNLOGGED, INHERITS, PARTITION BY (range/list/hash), PARTITION OF...FOR VALUES, access method, WITH (reloptions), TABLESPACE, inline CHECK |
| Sub-objects | Indexes, PK/UNIQUE/FK/EXCLUDE constraints, named NOT NULL, Rules, Extended Statistics, REPLICA IDENTITY, RLS enable/force, child-local DEFAULT overrides |
Notable Gaps (Acknowledged)
- Triggers: Deferred pending
pg_get_trigger_ddl(attributed to Phil's re-roll) - Policies: Deferred pending
pg_get_policy_ddl - COMMENT ON / GRANT-REVOKE: Intentionally excluded (matching existing
_ddlfamily) - Typed tables (
CREATE TABLE ... OF type): Out of scope
Potential Review Concerns
-
Virtual generated columns: These are a relatively new feature (PG 17+). The deparse logic for virtual columns interacts with expression deparse in complex ways, especially regarding column references within the generation expression.
-
Partition handling complexity: Reconstructing
PARTITION BYclauses andFOR VALUESspecifications requires careful handling of partition bound datums, especially for multi-column range partitions and hash partitions with modulus/remainder. -
IDENTITY column sequence options: When identity columns have non-default sequence options, the patch must reconstruct the full
GENERATED {ALWAYS|BY DEFAULT} AS IDENTITY (sequence_options)syntax, which requires queryingpg_sequenceand comparing against defaults. -
Inheritance interactions: Columns inherited from parent tables, constraint inheritance, and the interaction between
INHERITSandPARTITION OFall add complexity to determining what should be explicitly stated vs. what is implied. -
Idempotency and IF NOT EXISTS: The patch description doesn't mention whether
IF NOT EXISTSis an option — this matters for use cases like schema synchronization.
Relationship to Existing Infrastructure
The function builds on substantial existing infrastructure in ruleutils.c:
pg_get_indexdef_worker()— index DDL reconstructionpg_get_constraintdef_worker()— constraint DDLpg_get_expr()— expression deparse (for defaults, check constraints, generated columns)pg_get_partkeydef()— partition key deparse
The new code likely lives in a new file or extends ruleutils.c significantly, orchestrating calls to these existing workers while handling the table-level framing that none of them address.