Incremental View Maintenance, take 2

First seen: 2023-06-01 14:59:09+00:00 · Messages: 44 · Participants: 6

Latest Update

2026-05-25 · claude-opus-4-6

Incremental View Maintenance (IVM) for PostgreSQL — Deep Technical Analysis

Core Problem

Materialized views in PostgreSQL can only be refreshed via REFRESH MATERIALIZED VIEW, which recomputes the entire view from scratch. For large views where only a small fraction of base data changes, this is extremely wasteful. Incremental View Maintenance (IVM) solves this by computing only the delta (changes) to the materialized view when base tables are modified, then applying those deltas directly — avoiding full recomputation.

This is architecturally significant because it transforms materialized views from a batch-oriented caching mechanism into a near-real-time derived data structure, opening the door to "query rewriting using materialized views" (AQUMV) in OLAP workloads and dramatically improving OLTP scenarios where views summarize frequently-updated tables.

Proposed Design

Trigger-Based Architecture

The implementation uses statement-level AFTER triggers on all base tables referenced in the view definition. When a base table is modified:

  1. BEFORE triggers create tracking entries, count trigger firings, and capture a pre-modification snapshot.
  2. AFTER triggers preserve transition tables (the inserted/deleted rows). When the BEFORE and AFTER trigger counts match (indicating the final AFTER trigger), the actual view maintenance executes.

This design leverages PostgreSQL's existing transition table infrastructure (added in PG10 for triggers) but requires a critical modification: prolonging the lifespan of transition table tuplestores beyond their normal scope (end of nested query) to the end of the outermost query. This is necessary when multiple tables are modified in a single statement (via foreign keys, CTEs, etc.).

Delta Computation Algorithm

For a view V = SELECT * FROM R, S:

Where S_pre is the pre-update state of S, computed by filtering inserted tuples (using IVM_visible_in_prestate() with the pre-modification snapshot) and appending deleted tuples from the old transition table via UNION ALL.

Duplicate Tuple Handling

Views can contain duplicate rows. Multiplicity is tracked via count(*) in delta computations. Deletions use row_number() to identify exactly which physical tuples to remove. Insertions use generate_series() to replicate tuples to their required multiplicity.

DISTINCT Support

A hidden column __ivm_count__ stores tuple multiplicity. When tuples are deleted, the count is decremented; when it reaches zero, the row is physically removed. This requires hiding columns with the __ivm_ prefix from SELECT * queries by modifying parse_relation.c.

Aggregate Support

Built-in aggregates (count, sum, avg, min, max) are supported via additional hidden columns storing intermediate state (e.g., for avg(x): both sum(x) and count(x) are maintained). The maintenance logic generates complex UPDATE SET clauses via SPI-executed SQL. Min/max require special handling: when the current min/max value is deleted, a full recalculation from base tables is needed (stored in recalc_and_set_values()).

Concurrency Control

When the view references multiple tables, an exclusive lock is acquired before maintenance to prevent inconsistent results. Single-table views can use weaker locks, except for DELETE/UPDATE operations which rely on ctid-based tuple identification.

Key Design Tensions and Open Questions

1. Trigger-Based vs. Executor-Based Implementation

The trigger approach reuses existing infrastructure but has limitations:

An executor-based approach (like declarative partitioning) would be cleaner but requires more invasive changes. This remains the most fundamental architectural question.

2. Syntax: New Keyword vs. Reloption

The proposed CREATE INCREMENTAL MATERIALIZED VIEW syntax requires a new keyword. Kirill Reshke argues for CREATE MATERIALIZED VIEW ... WITH (maintain_incr=true) which:

Nagata has expressed openness to this approach.

3. Feature Scope for Initial Commit

There's a tension between completeness and reviewability. The full patch includes DISTINCT, aggregates, and duplicate-tuple support — all requiring hidden columns and complex delta logic. Nagata has suggested potentially limiting the first release to basic IVM (no aggregates, no DISTINCT, no duplicate tuples) to focus on core design review.

4. Hidden Columns

The __ivm_ prefix approach is a pragmatic hack that modifies parse_relation.c to hide columns by name pattern. A proper solution would be a pg_attribute flag for column visibility, but that proposal (from another thread) is dormant. This remains controversial — it reserves a broad namespace pattern and makes the catalog representation non-obvious.

5. Pre-Update State Computation

The current approach crafts subqueries with visibility functions. An alternative would be snapshot-based: allow executor scan nodes to use a specified snapshot for specific tables (similar to Oracle's AS OF clause). This would be cleaner but requires non-trivial executor changes.

6. Bitmapset Misuse for OID Tracking

Zhang Mingli from Apache Cloudberry identified a bug: CreateIvmTriggersOnBaseTablesRecurse() stores rte->relid (OID, potentially >2^31) in a Bitmapset designed for small integers. With OIDs around 17000, this already wastes ~0.2MB. With OIDs exceeding 0x7FFFFFFF, assertion failures occur. The fix is to use a hash table instead.

Patch Structure (v34, 11 patches)

  1. Grammar/Syntax — New CREATE INCREMENTAL MATERIALIZED VIEW production
  2. Catalogrelisivm boolean in pg_class
  3. Trigger InfrastructureSetTransitionTablePreserved for extended tuplestore lifetime
  4. pg_dump — Dump IMMVs with correct syntax
  5. psql — Tab completion, \d+ output showing "Incremental view maintenance: yes"
  6. Core IVM — Trigger creation, delta computation, delta application (basic joins/WHERE)
  7. DISTINCT — Hidden __ivm_count__ column, apply_old/new_delta_with_count()
  8. Aggregates — count/sum/avg with hidden helper columns
  9. Min/Max — Recalculation fallback when extremal values are deleted
  10. Tests — Regression test suite
  11. Documentation — SGML docs in rules chapter

Maturity Assessment

This patch has been in development since 2018 (original thread) with 34+ versions. It is architecturally ambitious but faces several challenges for committability:

The most promising path forward appears to be: adopt reloption syntax, limit initial scope to basic IVM without aggregates/DISTINCT, resolve the trigger vs. executor question, and ensure proper EXPLAIN/CONTEXT integration.