2026-05-27 · claude-opus-4-6

Deep Technical Analysis: Data-Level Dependencies in pg_dump/pg_upgrade

The Core Problem

PostgreSQL's pg_dump architecture separates output into distinct sections: pre-data (schema definitions), data (table contents via COPY), and post-data (indexes, constraints, triggers). This separation is fundamental to how pg_dump enables parallel restore and logical ordering. However, it creates an implicit assumption: no schema definition depends on the data content of another table.

This assumption breaks with certain extension patterns, most notably PostGIS. The canonical example involves PostGIS's spatial_ref_sys table, which contains spatial reference system definitions (SRID entries). When a user table has a column defined with a geometry type constrained to a specific SRID (e.g., geometry(Point, 4326)), the column's type modifier validation requires that the referenced SRID already exists as a row in spatial_ref_sys at the time the column is created. This creates a cross-section dependency: a pre-data object (table DDL) depends on another table's data section content.

During pg_restore or pg_upgrade, this manifests as:

Pre-data section runs: all table definitions are created
User table with geometry column references SRID 4326
The type modifier function queries spatial_ref_sys — but it's empty because data hasn't been loaded yet
Restore fails

This is architecturally significant because pg_depend — PostgreSQL's dependency tracking catalog — records object-to-object dependencies (OIDs), not object-to-row dependencies. There is no existing catalog infrastructure to express "this column definition depends on specific rows in another table."

Proposed Solutions

Solution 1: `--extra-dependencies` Option (Original Patch)

Jeevan Chalke's patch introduces a user-facing option for both pg_dump and pg_upgrade:

Mechanism:

Users specify dependencies in the format table#referenced_table
pg_dump injects these as synthetic edges in the topological sort graph (the same graph used for standard pg_depend-based ordering)
For referenced tables, data is dumped immediately after the table's schema definition, breaking the normal section separation

Key Implementation Details:

Leverages dumpTableData() infrastructure but invokes it during pre-data emission
Effectively creates a "pre-data data" subsection for specific tables
The dependency hint forces ordering in the sort algorithm without requiring catalog changes

Tradeoffs:

Requires user knowledge of implicit dependencies (bad UX)
Works for pg_upgrade where the tool can be told about known extension patterns
Doesn't solve the general case; every new extension with this pattern needs manual configuration
Breaks the clean section separation that parallel restore relies on

Solution 2: `user_catalog_table` + pg_dump Ordering (Peter Eisentraut)

Peter Eisentraut proposes a more principled approach:

Mechanism:

PostGIS marks spatial_ref_sys with ALTER TABLE ... SET (user_catalog_table = true)
pg_dump is modified to recognize user catalog tables and dump their contents in the pre-data section (before other DDL that might depend on them)

Why this is architecturally superior:

user_catalog_table already exists in PostgreSQL (it enables HOT-avoidance and marks tables as infrastructure-like)
No user-facing options needed; the dependency information is stored in the catalog (reloptions)
pg_dump can generically handle all such cases without per-extension knowledge
The extension author declares intent, not the end user

Remaining work: pg_dump doesn't currently inspect user_catalog_table for ordering decisions, so code changes are still needed.

Solution 3: Extension-First Restoration (Paul Ramsey's Suggestion)

Paul Ramsey questions why extensions aren't restored completely (schema + data) before user objects:

Rationale:

Extension tables like spatial_ref_sys are already known to pg_dump as extension members (via pg_extension_config_dump)
If all extension objects (including their data) were fully materialized before any user DDL, the dependency would be naturally satisfied

Challenges:

This would require restructuring pg_dump's section model significantly
Extensions can themselves depend on user objects in complex upgrade scenarios
May introduce circular dependency issues in edge cases
Parallel restore optimization assumes section boundaries

Architectural Tension

The fundamental tension in this thread is between three philosophies:

User-side workaround (Chalke): Give users/tools a knob to inject missing dependency information. Pragmatic but inelegant.
Catalog-driven autodiscovery (Eisentraut): Use existing catalog mechanisms (user_catalog_table) to let pg_dump infer the correct behavior. Clean but requires extension cooperation.
Structural reordering (Ramsey): Change pg_dump's fundamental ordering to handle extensions as atomic units. Most comprehensive but most invasive.

Matthias van de Meent's feedback crystallizes the community preference: dependencies should be stored in catalogs and resolved automatically, not require user input. This aligns with PostgreSQL's general philosophy of catalog-driven behavior.

Current Status

As of the last message (May 2025), no consensus has been reached. Peter Eisentraut's user_catalog_table approach appears to have the most traction as a "principled" solution, but Paul Ramsey raises valid questions about whether it introduces unintended side effects for extension tables and whether a simpler ordering change (extension-complete-first) would suffice. The original patch remains a proof-of-concept without commitment path.

Add --extra-dependencies and immediate data dumping for pg_dump/pg_upgrade

Latest Update

Deep Technical Analysis: Data-Level Dependencies in pg_dump/pg_upgrade

The Core Problem

Proposed Solutions

Solution 1: `--extra-dependencies` Option (Original Patch)

Solution 2: `user_catalog_table` + pg_dump Ordering (Peter Eisentraut)

Solution 3: Extension-First Restoration (Paul Ramsey's Suggestion)

Architectural Tension

Current Status

Add --extra-dependencies and immediate data dumping for pg_dump/pg_upgrade

Latest Update

Deep Technical Analysis: Data-Level Dependencies in pg_dump/pg_upgrade

The Core Problem

Proposed Solutions

Solution 1: --extra-dependencies Option (Original Patch)

Solution 2: user_catalog_table + pg_dump Ordering (Peter Eisentraut)

Solution 3: Extension-First Restoration (Paul Ramsey's Suggestion)

Architectural Tension

Current Status

Solution 1: `--extra-dependencies` Option (Original Patch)

Solution 2: `user_catalog_table` + pg_dump Ordering (Peter Eisentraut)