support create index on virtual generated column.

First seen: 2025-03-26 07:15:00+00:00 · Messages: 17 · Participants: 6

Latest Update

2026-05-06 · opus 4.7

Core Problem

PostgreSQL 18 introduced virtual generated columns (commit 83ea6c5), where the column's value is computed on read rather than stored. This provides a "free" schema refactoring primitive: the generation expression can be altered via ALTER TABLE ... ALTER COLUMN ... SET EXPRESSION without rewriting the heap. However, the initial implementation explicitly prohibited creating indexes on virtual generated columns. There is a comment in DefineIndex():

/*
 * XXX Virtual generated columns in index expressions or predicates
 * could be supported, but it needs support in
 * RelationGetIndexExpressions() and RelationGetIndexPredicate().
 */

This thread attempts to lift that restriction. The motivation (articulated most clearly by Corey Huinker) is migration ergonomics: users who already have expression indexes like CREATE INDEX ON t (lower(name)) should be able to introduce a virtual column lname GENERATED ALWAYS AS (lower(name)) VIRTUAL and have queries referencing the new column name use the pre-existing index — i.e., the virtual column should be a transparent "alias" that the planner can match against expression indexes.

Proposed Solutions and Design Tensions

Approach 1 (Jian He, v1–v7): Full first-class support with catalog extension

The patch transforms CREATE INDEX t(vcol) internally into an expression index on the generation expression — CREATE INDEX t ((a*2)) — and then tracks the original virtual column's attnum in a new pg_index column indattrgenerated. This new column is needed because after expansion, pg_index.indkey and indexprs look identical to a pure expression index; without the extra column, ALTER COLUMN SET EXPRESSION and ALTER COLUMN TYPE cannot identify which indexes depend on the virtual column and must be rebuilt.

The patch also adds dependencies between the index and:

  1. The virtual column's pg_attribute entry (so DROP COLUMN cascades to the index).
  2. The columns referenced inside the generation expression.

Restrictions imposed by the patch:

Tom Lane's objection (the key architectural pushback)

Tom Lane raised a fundamental objection: indexing a virtual column destroys its defining advantage. The whole point of VIRTUAL is that SET EXPRESSION is free; an index forces a rebuild, recreating exactly the cost profile of a STORED generated column plus an index. Furthermore, expression indexes have existed "for decades" and already solve the underlying use case.

This objection carries significant weight — Tom is a core committer with deep optimizer expertise — and it effectively reframes the feature from "new capability" to "syntactic convenience / migration aid."

Approach 2 (Peter Eisentraut's counter-proposal)

Peter Eisentraut (the committer who authored virtual generated columns) proposed a much simpler minimum viable version:

  1. Allow virtual generated columns inside expression indexes and predicates (the thing the XXX comment already anticipated).
  2. Prohibit SET EXPRESSION on a column used in any index.

This sidesteps the entire complexity stack: no new pg_index column, no ALTER TABLE recursion logic, no rebuild-on-expression-change machinery, no changes to indkey semantics. It reduces the feature to making RelationGetIndexExpressions() / RelationGetIndexPredicate() expand virtual column references.

Peter also pointed out that Jian's patch deleted the XXX comment without actually modifying the two functions it references — suggesting either the comment is wrong, or the patch is incomplete.

Jian He's rebuttal on architecture

Jian's response to Peter is technically incisive and exposes a subtle catalog representation question:

"I don't think it's a good idea to store the virtual generated columns as is in Form_pg_index->indkey because IndexInfo->ii_IndexAttrNumbers and Form_pg_index->indkey are referenced in too many places (see BuildIndexInfo). For every single occurrence of indkey.values[i], we need to consider whether it's ok for it to be a virtual generated column."

The argument is that storing the virtual column's attnum directly in indkey forces every consumer of indkey throughout the executor and planner to become virtual-column-aware — a huge audit surface. By expanding the expression at catalog-write time and storing the expanded form in indexprs/indpred, all downstream code continues to work unchanged. Hence RelationGetIndexExpressions and RelationGetIndexPredicate need no modification — the comment is simply wrong.

This is the correct insight architecturally, but it cuts against Peter's simpler proposal because expansion-at-catalog-write-time is what creates the need for the new indattrgenerated column to track the original virtual column for invalidation purposes.

Key Technical Insights

Planner transparency

Jian demonstrated that with expression expansion, the planner naturally handles both query forms identically:

EXPLAIN SELECT ... WHERE lowercase_name = 'x';      -- virtual col reference
EXPLAIN SELECT ... WHERE lower(regular_name) = 'x'; -- raw expression

Both produce the same plan because get_relation_info() sees the same indexprs. This satisfies Corey's migration use case "for free" — the index works regardless of which form the query uses, which is actually a point in favor of the feature.

Dependency correctness bug (v1)

Kirill Reshke caught a concrete bug in v1: ALTER TABLE DROP COLUMN on the virtual column did not drop the index, leaving a zombie "........pg.dropped.2........" entry. This was fixed in v2 by adding explicit dependencies on both the virtual column attribute and the columns referenced by its generation expression.

Bootstrap mode crash

Soumya Murali discovered that ComputeIndexAttrs() crashes in bootstrap mode (during initdb) because it's called with pstate == NULL while system catalog indexes are built, before syscache is fully initialized. The virtual-generated-column rewrite path must be guarded with IsBootstrapProcessingMode(). This is a classic pitfall for any catalog-touching code added to the index creation path.

The SET EXPRESSION semantics question

A lingering unresolved question posed by Jian: when the generation expression changes, should the index be rebuilt (current v8 behavior) or marked invalid (forcing explicit REINDEX)? The rebuild path matches ALTER COLUMN TYPE behavior but makes the feature's cost worse; the invalidation path is cheaper but surprising. Peter's proposal sidesteps this entirely by forbidding SET EXPRESSION when indexed.

Status and Trajectory

After ~11 months the patch has reached v9 (Kirill's rebase over commit 570e2fcc041a). The architectural direction is not settled: Peter (the virtual columns author and likely committer for this area) is pushing toward the minimal variant, while Jian has implemented the maximal variant. Tom's skepticism about the feature's value proposition has not been directly addressed. The feature is unlikely to commit until either (a) Jian produces the minimal version Peter asked for, or (b) there is consensus that the indattrgenerated catalog addition is justified by the UX benefit over plain expression indexes.