Core Problem
pg_dump emits a spurious OVERRIDING SYSTEM VALUE clause in generated INSERT statements for tables that no longer have an identity column but once did. The clause is technically harmless at replay time (PostgreSQL accepts OVERRIDING SYSTEM VALUE even when no identity column exists in the insert target list — or, more precisely, it is simply a no-op when there is nothing to override), but the output is semantically misleading and, more practically, produces noisy diffs in deterministic-dump-based CI pipelines.
Root Cause in the Catalog
The bug stems from a well-known PostgreSQL design choice: ALTER TABLE ... DROP COLUMN does not physically remove the pg_attribute row. Instead it sets attisdropped = true, leaving the tuple in place so that tuple descriptors for existing on-disk heap tuples remain consistent (column ordering and physical layout are preserved; the dropped column becomes a "ghost" attribute). Many other attribute fields — including attidentity, atthasdef, attnotnull, and type information — are not reset when a column is dropped. This is intentional for the on-disk format, but it means any code iterating pg_attribute by attnum > 0 must explicitly filter out attisdropped rows when interpreting semantic column properties.
The Specific Bug Path in pg_dump
In src/bin/pg_dump/pg_dump.c, getTableAttrs() loops over all attributes with attnum > 0, including dropped ones (because pg_dump needs to know about them to reconstruct the correct column layout on restore). Within that loop:
tbinfo->needs_override = tbinfo->needs_override ||
(tbinfo->attidentity[j] == ATTRIBUTE_IDENTITY_ALWAYS);
Because the dropped column's attidentity still holds 'a' (ATTRIBUTE_IDENTITY_ALWAYS), the OR-accumulator needs_override is latched to true for the entire table, even though no surviving column is an identity column. Later, dumpTableData_insert() consults needs_override and unconditionally inserts the OVERRIDING SYSTEM VALUE clause.
Proposed Fix
The submitted patch gates the needs_override computation on !attisdropped[j]. Andreas (Karlsson) suggested two stylistic reformulations to avoid a long line:
if (tbinfo->attidentity[j] == ATTRIBUTE_IDENTITY_ALWAYS &&
!tbinfo->attisdropped[j])
tbinfo->needs_override = true;
This is a minimal, surgical fix confined to pg_dump. The author (Bernbaum) explicitly considered and rejected the alternative of clearing attidentity inside the DROP COLUMN path in the backend, correctly reasoning that:
- It would not help existing clusters whose catalogs already carry stale
attidentityvalues from past drops — upgrades via pg_upgrade carry those catalog rows forward verbatim. - It would introduce a behavioral change in the catalog layer for what is really a reporting-tool defect.
This reasoning is architecturally sound. The invariant "dropped attribute rows retain their pre-drop metadata" is long-standing; clients are expected to filter. Fixing the consumer (pg_dump) is the right layer.
Broader Implication: Is This a Class of Bugs?
Andreas's second message raises the key question that elevates this from a one-line fix to something worth auditing: are there other places in pg_dump (or elsewhere) where dropped attributes' stale metadata leaks into generated output? He specifically calls out hasdefaults, which follows an identical accumulation pattern — if a dropped column had a DEFAULT, atthasdef may still be true on the dropped attribute row, and similar OR-accumulators in pg_dump could be similarly poisoned. This is exactly the right instinct: when you find one bug of pattern "iterate pg_attribute including dropped columns and OR a flag", you should grep for all such patterns.
Other candidates worth checking (not raised in-thread but implied by the pattern):
- Generated columns (
attgenerated) — same storage location as identity, same risk. attnotnullaccumulation for table-level constraints.- Collation/typmod handling when those values are consulted outside the dropped-column filter.
Process / Submission Feedback
Andreas's procedural comments are worth noting because they reflect normal pgsql-hackers hygiene:
- CommitFest registration. Patches not registered on commitfest.postgresql.org are routinely lost in list traffic. First-time contributors frequently miss this.
git format-patchformat. The original patch was apparently a plain diff that did not apply cleanly viagit am. Format-patch output carries author metadata, commit message, and 3-way-merge base info that makes review reproducible. The-vflag for versioned resubmissions is the accepted convention.- Test placement in
002_pg_dump.pl. This is the centralized TAP test harness that runs pg_dump/pg_restore against a single shared cluster with a matrix of dump modes (data-only, schema-only, --inserts, --column-inserts, binary, etc.). Adding new test cases here is vastly cheaper than spinning up a dedicated test file with its owninit/start— the shared cluster model is the whole point of that file's design, even though (as Andreas admits) the file itself is notoriously dense and hard to navigate.
Assessment
The patch is correct, minimal, and targets the right layer. Outstanding work before commit:
- Reformat per Andreas's readability suggestion.
- Re-submit via
git format-patch. - Register in CF 59.
- Add a regression case to
002_pg_dump.plexercising the drop-identity-column-then-dump scenario. - Audit sibling accumulators (especially
hasdefaults) for the same dropped-column-leak pattern and, if found, fold those fixes into the same patch series.
No committer has weighed in yet; Andreas Karlsson is a regular contributor/reviewer but not the eventual committer of record. The fix is small enough that it is likely to be back-patched to all supported branches once it lands, since identity columns shipped in PG 10 and the bug has been latent since then.