Disallow whole-row index references with virtual generated columns?

First seen: 2026-05-08 12:41:08+00:00 · Messages: 5 · Participants: 3

Latest Update

2026-05-09 · opus 4.7

Whole-Row Var References and Virtual Generated Columns: A Semantic Leak in Index Predicates/Expressions

The Core Problem

Virtual generated columns (introduced in PG18) are not materialized in the heap tuple; they are computed on read via expansion in the planner/executor. This creates a fundamental tension with index machinery, which operates on physical heap tuples during index build and HOT-path index maintenance.

DefineIndex already knows about this tension: it explicitly rejects index expressions and predicates that reference a virtual generated column by name (i.e., a Var with a specific varattno pointing at a GENERATED ... VIRTUAL attribute). The existing source comment acknowledges this is a conservative limitation — it "could be supported, but it needs support in RelationGetIndexExpressions() and RelationGetIndexPredicate()" to perform the virtual-column expansion at index-time the same way the planner does for queries.

Ayush Tiwari's report identifies a hole in that guard: whole-row Var references (varattno == 0, which conceptually denotes the entire composite row including all user-visible columns) are not covered by the check. Because a whole-row Var logically includes the virtual generated column's value at the SQL level, but the index code sees only the physical heap tuple, there is a semantic mismatch between what SQL says the predicate/expression evaluates to and what the index actually stores.

Concrete symptom 1 — partial index predicate lies

CREATE TABLE t (a int, b int GENERATED ALWAYS AS (a * 2) VIRTUAL);
CREATE UNIQUE INDEX ON t (a) WHERE t IS NOT NULL;
INSERT INTO t(a) VALUES (1);
INSERT INTO t(a) VALUES (1);   -- succeeds! uniqueness silently violated

At SQL level, t IS NOT NULL is true for (1, 2). But the predicate evaluator, running against the heap tuple (1, <unset>), decides differently (or at least differently enough that both rows end up either excluded from or inconsistently handled by the unique index), bypassing the uniqueness enforcement.

Concrete symptom 2 — whole-row expression index stores wrong value

CREATE UNIQUE INDEX ON virtual_expr_u ((virtual_expr_u));

A SELECT virtual_expr_u FROM virtual_expr_u returns (1,2), but the index stores (1,) — the unexpanded heap tuple — as shown by the error detail Key ((virtual_expr_u.*))=((1,)). The index is keying on a value that does not match what any SQL query would produce for the same expression. This is a correctness bug: two independent evaluations of the same expression give different answers depending on whether you are a SQL query or an index.

Design Question: Scope of the Restriction

The proposed patch extends the existing DefineIndex check so that varattno == 0 is also rejected — but only when the indexed relation actually has at least one virtual generated column. This preserves whole-row expression indexes on ordinary tables, which do work today.

Álvaro Herrera (committer) pushed back on two grounds:

  1. Is this virtual-column-specific at all? He observes the original WHERE t IS NOT NULL example ALSO fails to enforce uniqueness on a plain-column table with nullable columns, because (1, NULL) IS NOT NULL is false — so the row is correctly excluded from a partial index. Ayush conceded this example was poor: for a normal nullable column the behavior is semantically correct; for a virtual column it is not, because SQL-level evaluation of IS NOT NULL on the composite row yields true (since the virtual column expands to 2), yet the index excludes the row.

  2. Is the base restriction even justified? Álvaro suggests the current blanket prohibition on virtual columns in indexes may be incidental ("just because we haven't implemented it yet"), not fundamental. The proper fix might be to teach RelationGetIndexExpressions/RelationGetIndexPredicate to expand virtual generated columns, rather than to extend the prohibition.

Matthias van de Meent raised an orthogonal but related concern: whole-row IS [NOT] NULL expressions in indexes are arguably not immutable in the sense the index machinery requires, because adding a new column with a non-NULL default can retroactively flip the value of row IS NOT NULL for existing rows — silently corrupting any index that depends on the predicate. He suggests such expressions perhaps should not be allowed in indexes at all. His test script demonstrates divergent behavior between STORED and VIRTUAL variants, confirming the virtual-column case is specifically broken.

Why This Matters Architecturally

The mismatch points at a layering issue: virtual generated column expansion is performed at query rewrite/plan time (via expand_generated_columns_in_query / in the ExpandSingleTableVar paths), but index build and ExecIndexBuildScanKeys-style predicate evaluation operate below that layer, directly on TupleTableSlots sourced from the heap. Any expression that reaches down to whole-row composition (ROW() of all attributes, including a whole-row Var) will, without explicit handling, observe the un-expanded physical tuple.

Two coherent fixes exist:

Álvaro's remark that "I'm not really sure what's the usefulness of indexes over whole-row types" suggests the low-priority path: the restrictive fix is probably adequate because the use case is thin. But the underlying correctness gap — SQL and index disagreeing about the value of the same expression — is a real bug regardless of how common the pattern is.

Open Questions Left by the Thread