[PATCH] no table rewrite when set column type to constrained domain

First seen: 2025-07-09 18:00:15+00:00 · Messages: 12 · Participants: 4

Latest Update

2026-05-18 · claude-opus-4-6

Technical Analysis: Avoiding Table Rewrite When Setting Column Type to Constrained Domain

Core Problem

When PostgreSQL executes ALTER TABLE ... ALTER COLUMN ... SET DATA TYPE, it traditionally performs a full table rewrite — creating a new physical copy of the entire table with the column stored in the new type. This is expensive for large tables (acquiring AccessExclusiveLock, copying all rows, rebuilding indexes).

However, there's an important class of type changes where a rewrite is unnecessary: changing a column to a constrained domain over the same base type. For example:

CREATE DOMAIN positive_int AS INT CHECK(VALUE > 0) NOT NULL;
ALTER TABLE big_table ALTER COLUMN x SET DATA TYPE positive_int;

Here, the physical storage representation is identical (int4int4). The only semantic difference is that new domain constraints must be validated against existing data. This validation requires a table scan (to check each row against the domain's CHECK constraints), but NOT a table rewrite (no data needs to be physically transformed or copied).

The distinction matters enormously for large tables: a scan is O(n) reads, while a rewrite is O(n) reads + O(n) writes + index rebuilds + WAL generation.

Architectural Context

PostgreSQL's ALTER TABLE pipeline has three phases:

  1. ATPrepAlterColumnType (Phase 1): Determines what coercion expression is needed, calls ATColumnChangeRequiresRewrite() to decide if a rewrite is necessary.
  2. ATRewriteTables (Phase 2): Orchestrates table rewrites.
  3. ATRewriteTable (Phase 3): Actually scans/rewrites the table, evaluating NewColumnValue expressions.

The key data structure is NewColumnValue, which holds the expression to transform old column values into new ones. The existing code treats the presence of any NewColumnValue entry as implying a table rewrite is needed.

Proposed Solution (Evolution Across Versions)

v1-v2: coerce_to_domain / scan_only flag

The initial approach added a boolean to NewColumnValue indicating that only a scan (not a rewrite) is needed. The field was first called coerce_to_domain, then renamed to scan_only for generality.

v3: need_compute flag (inverted semantics)

Renamed to need_compute — when true, the expression result is written back to the tuple (rewrite); when the coercion is purely a domain constraint check, the expression is evaluated only for its side effect (raising an error if constraints are violated), and the existing tuple data is left unchanged.

v5-v6: Refined scope

After realizing that ArrayCoerceExpr (casting array element types) genuinely requires a rewrite, the patch narrowed its optimization: only when the new type is a constrained domain directly over the old base type (where ATColumnChangeRequiresRewrite() already returns false) does it apply the scan-only path. The expression tree must be a simple CoerceToDomain wrapping a Var reference to the original column.

Key Implementation Details

  1. Expression evaluation without writing back: In the scan-only path, ExecEvalExpr() is called on the CoerceToDomain expression. If domain constraints are violated, this raises an ERROR (aborting the ALTER). If it succeeds, the return value is discarded — the tuple is not modified.

  2. Volatility check: The patch restricts scan-only behavior to domains where all constraints are non-volatile. A volatile constraint (e.g., CHECK(VALUE > random(...)) could produce different results on re-evaluation, so a rewrite might be semantically necessary to "lock in" the validated state. This is somewhat debatable (see below).

  3. Tuple descriptor manipulation: The patch temporarily swaps the tuple descriptor during constraint evaluation because the expression expects to operate on the old column type, but the scan context may already reflect metadata changes.

  4. Unconstrained domain edge case: Viktor identified that for CREATE DOMAIN mydom AS int (no constraints), the patch's tab->newvals check causes an unnecessary scan where previously nothing was done.

Key Technical Disagreements and Open Issues

Volatility Requirement

Viktor questioned why volatile constraints require a rewrite: "A rewrite doesn't make things more deterministic right?" This is a valid point — if the constraint uses random(), it's non-deterministic whether you check during a scan or during a rewrite. The author's conservative stance avoids subtle correctness issues but may be overly restrictive.

USING Expression Bug

Viktor discovered a critical correctness bug: when a USING clause provides a constant (e.g., USING 5), the scan-only path was incorrectly applied, leaving the original value unchanged instead of writing the constant. This is because the optimization must ONLY apply when the expression is purely a domain constraint check over the existing value — any actual value transformation requires a rewrite.

ExecEvalExpr vs ExecEvalExprNoReturn

The code uses (void) ExecEvalExpr(...) rather than ExecEvalExprNoReturn() because the expression was compiled with ExecInitExpr (which sets up EEOP_DONE_RETURN as the final step). Using ExecEvalExprNoReturn would be incorrect for an expression compiled this way. This was poorly explained in comments and confused reviewers.

Unconstrained Domain Regression

For unconstrained domains (no CHECK, no NOT NULL), the previous behavior was to skip both rewrite and scan entirely (just a catalog update). The patch's addition of || tab->newvals to the ATRewriteTable trigger condition may cause an unnecessary full-table scan for these cases.

Design Tradeoffs

Aspect Conservative (current patch) Aggressive alternative
Volatile constraints Force rewrite Allow scan-only
USING clause Force rewrite if expression isn't identity coercion Could try to detect pure-constraint expressions
Unconstrained domains Need fix to avoid unnecessary scan Skip entirely
Domain over arrays Force rewrite Complex analysis of element coercion

Current Status (as of v6)

The patch is functional for the core case (constrained domain over same base type, no USING override). Outstanding issues: