Technical Analysis: Determinism of pg_dump Table Output Ordering
Core Problem
A user observed that two semantically identical PostgreSQL schemas—one created from scratch via DDL scripts and one produced by applying migration scripts to a copy of production—yield different pg_dump output when dumped. The difference is purely in the ordering of 3 tables (not their definitions), producing 8 textual diffs (4 for table definitions, 4 for permissions). This matters because the user's migration validation workflow depends on byte-identical schema dumps to confirm correctness.
The fundamental question is whether pg_dump's output ordering is fully deterministic given identical logical schemas, or whether it can be influenced by physical catalog state (e.g., OID assignment order, pg_depend row ordering, heap tuple layout).
Architectural Context: How pg_dump Orders Objects
pg_dump's ordering pipeline has two stages:
-
sortDumpableObjectsByTypeName— An initial sort that groups objects by type and then sorts by name within each type. This provides a human-readable baseline order and is the "primary sort key" Tom Lane references. -
sortDumpableObjects— A topological sort that respects dependency edges (frompg_dependand inferred dependencies). When the topological sort has multiple valid orderings (i.e., objects with no dependency relationship between them), the tie-breaking falls back to the initial name-based order.
The critical insight is that topological sort is inherently non-unique: when objects are not connected by dependency chains, their relative order depends on implementation details—specifically the order in which they enter the priority queue or worklist. If the initial sort is fully deterministic (which sortDumpableObjectsByTypeName aims to be), and the dependency graph is identical, the output should be stable. But if either condition fails, ordering divergence can occur.
Potential Root Causes Identified in Discussion
1. Differing Dependency Graphs Despite Identical Schemas
The user's setup involves a version-crossing scenario: a PostgreSQL 14.22 server with a PostgreSQL 18.1 client pg_dump. When pg_dump queries catalog tables on a v14 server, it uses compatibility code paths that may infer slightly different dependency edges than when querying a v18 server. The user noted that the 3 misordered tables had dependencies related to inheritance hierarchies—exactly the kind of complex dependency that has seen significant rework between major versions.
If migration-applied production has different OIDs (and thus different pg_depend entries by OID) than the from-scratch database, even though the logical dependencies are equivalent, the topological sort could traverse them differently. OIDs are assigned sequentially, and the order of DDL execution affects OID assignment, which affects catalog row ordering, which can affect the order pg_dump reads dependencies.
2. Locale Sensitivity in Name-Based Sorting
Tom Lane raised the possibility of locale-dependent sorting of object names. The user confirmed all names are ASCII [a-z0-9_] and the locale is C, effectively ruling this out. Under C locale, strcmp-based ordering is fully deterministic and portable.
3. NOT NULL Constraint Handling Changes in v18
The discussion surfaced a secondary issue: the user initially believed they found a pg_dump bug where CREATE TABLE test_child (NOT NULL f) INHERITS (...) appeared syntactically invalid. This turned out to be a new syntax in PostgreSQL 18 (NOT NULL column_name as a standalone table constraint), which didn't exist in v14. This is a direct consequence of the significant NOT NULL constraint refactoring in v18 (tracked across multiple commits). While this wasn't actually a bug, the user noted that the same tables involved in this inheritance pattern were the ones exhibiting ordering differences—suggesting that the dependency edges generated for inheritance hierarchies with NOT NULL constraints may differ between the two schema creation paths.
4. Cross-Version Dump Stability
The most architecturally significant factor is the cross-version scenario. pg_dump is designed to dump from older servers, but the dependency inference logic must handle catalog representations that differ across versions. v14 stores NOT NULL constraints differently than v18 (v18 has pg_constraint entries for NOT NULL, v14 does not). pg_dump's compatibility layer for v14 must synthesize these, and subtle differences in how that synthesis occurs relative to a native v18 catalog could produce different dependency graphs—hence different topological sort outcomes.
The "NOT NULL as table_constraint" Tangent
The user's test case:
CREATE TABLE test_parent (f integer);
CREATE TABLE test_child (PRIMARY KEY (f)) INHERITS (test_parent);
Dumps as:
CREATE TABLE test_child (
NOT NULL f
) INHERITS (test_parent);
In PostgreSQL 18, this is valid syntax. The NOT NULL column_name form is a table-level constraint syntax added as part of the NOT NULL constraint normalization work. The column f is inherited from test_parent (so not redeclared), the NOT NULL is a standalone constraint (implied by the primary key), and the primary key itself is added later in the dump (after data loading, as is standard for pg_dump). Tom Lane confirmed this is correct behavior.
Resolution Status
The thread is unresolved. No definitive root cause was identified. Tom Lane confirmed that pg_dump ordering is intended to be deterministic and that regression tests depend on this property. He suggested trying v18.4 (multiple NOT NULL-related fixes landed between 18.1 and 18.4). The user agreed to continue investigating to produce a minimal reproduction case.
The most likely explanation, synthesizing the discussion, is that the cross-version dump scenario (v14 server + v18 client) combined with inheritance hierarchies and the v18 NOT NULL constraint refactoring produces subtly different dependency graphs depending on the physical catalog state (OID ordering), breaking the topological sort's determinism guarantees.
Key Takeaway for pg_dump Maintainers
The dependency-based topological sort in sortDumpableObjects should be audited for cases where tie-breaking among unrelated objects can vary based on catalog-physical (rather than catalog-logical) state. Any place where OID order leaks into the sort—particularly through the order of dependency edge discovery—is a potential source of non-determinism. This is especially relevant for cross-version dumps where dependency synthesis may not perfectly mirror native catalog state.