Add --extra-dependencies and immediate data dumping for pg_dump/pg_upgrade

First seen: 2025-12-24 11:44:18+00:00 · Messages: 5 · Participants: 4

Latest Update

2026-05-27 · claude-opus-4-6

Deep Technical Analysis: Data-Level Dependencies in pg_dump/pg_upgrade

The Core Problem

PostgreSQL's pg_dump architecture separates output into distinct sections: pre-data (schema definitions), data (table contents via COPY), and post-data (indexes, constraints, triggers). This separation is fundamental to how pg_dump enables parallel restore and logical ordering. However, it creates an implicit assumption: no schema definition depends on the data content of another table.

This assumption breaks with certain extension patterns, most notably PostGIS. The canonical example involves PostGIS's spatial_ref_sys table, which contains spatial reference system definitions (SRID entries). When a user table has a column defined with a geometry type constrained to a specific SRID (e.g., geometry(Point, 4326)), the column's type modifier validation requires that the referenced SRID already exists as a row in spatial_ref_sys at the time the column is created. This creates a cross-section dependency: a pre-data object (table DDL) depends on another table's data section content.

During pg_restore or pg_upgrade, this manifests as:

  1. Pre-data section runs: all table definitions are created
  2. User table with geometry column references SRID 4326
  3. The type modifier function queries spatial_ref_sys — but it's empty because data hasn't been loaded yet
  4. Restore fails

This is architecturally significant because pg_depend — PostgreSQL's dependency tracking catalog — records object-to-object dependencies (OIDs), not object-to-row dependencies. There is no existing catalog infrastructure to express "this column definition depends on specific rows in another table."

Proposed Solutions

Solution 1: --extra-dependencies Option (Original Patch)

Jeevan Chalke's patch introduces a user-facing option for both pg_dump and pg_upgrade:

Mechanism:

Key Implementation Details:

Tradeoffs:

Solution 2: user_catalog_table + pg_dump Ordering (Peter Eisentraut)

Peter Eisentraut proposes a more principled approach:

Mechanism:

Why this is architecturally superior:

Remaining work: pg_dump doesn't currently inspect user_catalog_table for ordering decisions, so code changes are still needed.

Solution 3: Extension-First Restoration (Paul Ramsey's Suggestion)

Paul Ramsey questions why extensions aren't restored completely (schema + data) before user objects:

Rationale:

Challenges:

Architectural Tension

The fundamental tension in this thread is between three philosophies:

  1. User-side workaround (Chalke): Give users/tools a knob to inject missing dependency information. Pragmatic but inelegant.
  2. Catalog-driven autodiscovery (Eisentraut): Use existing catalog mechanisms (user_catalog_table) to let pg_dump infer the correct behavior. Clean but requires extension cooperation.
  3. Structural reordering (Ramsey): Change pg_dump's fundamental ordering to handle extensions as atomic units. Most comprehensive but most invasive.

Matthias van de Meent's feedback crystallizes the community preference: dependencies should be stored in catalogs and resolved automatically, not require user input. This aligns with PostgreSQL's general philosophy of catalog-driven behavior.

Current Status

As of the last message (May 2025), no consensus has been reached. Peter Eisentraut's user_catalog_table approach appears to have the most traction as a "principled" solution, but Paul Ramsey raises valid questions about whether it introduces unintended side effects for extension tables and whether a simpler ordering change (extension-complete-first) would suffice. The original patch remains a proof-of-concept without commitment path.