[Patch] Add WHERE clause support to REFRESH MATERIALIZED VIEW

First seen: 2025-12-08 20:58:27+00:00 · Messages: 10 · Participants: 5

Latest Update

2026-05-22 · claude-opus-4-6

Technical Analysis: Add WHERE Clause Support to REFRESH MATERIALIZED VIEW

The Core Problem

PostgreSQL's materialized views (MVs) have a fundamental scalability limitation: the only way to update them is a full refresh, which re-executes the entire underlying query and replaces all rows. For large MVs where only a small subset of data changes frequently, this is an O(N) operation where O(delta) would suffice. This forces users into one of two workarounds:

  1. Manual table maintenance: Replace the MV with a regular table and maintain it via complex trigger logic — essentially hand-rolling incremental view maintenance (IVM).
  2. Separate high-churn columns: Split frequently-updated data into separate structures joined at query time.

Both approaches sacrifice the declarative simplicity that makes materialized views attractive. The patch proposes allowing REFRESH MATERIALIZED VIEW mv WHERE <predicate> to restrict the refresh scope to a subset of the view's output rows, enabling O(delta) refreshes with minimal user complexity.

Architectural Context

This feature sits at the intersection of several PostgreSQL subsystems:

Proposed Solutions and Their Evolution

v1: DELETE → INSERT with ROW EXCLUSIVE Lock

The initial implementation used a "Prune + Upsert" strategy:

  1. DELETE all MV rows matching the WHERE clause
  2. INSERT new data from the source query with ON CONFLICT DO UPDATE

This used ROW EXCLUSIVE locks (rather than ACCESS EXCLUSIVE), allowing concurrent reads and writes. However, the author discovered a consistency gap: once rows are DELETEd, their physical tuple locks are released. A concurrent transaction can insert a conflicting logical row during this window, leaving the MV inconsistent.

v2 (Proposed): Transaction-Level Advisory Locks

The author proposed using pg_advisory_xact_lock(mv_oid, hashtext(ROW(key_cols)::text)) to lock the logical identity of rows rather than physical tuples. This would serialize concurrent refreshes on the same logical rows while allowing non-overlapping refreshes to run in parallel.

Abandoned because it hit max_locks_per_transaction limits at scale — each row being refreshed would consume a lock slot, making bulk operations impractical.

v3 (Current): FOR UPDATE + Single CTE

The final approach:

  1. SELECT FROM mv WHERE ... FOR UPDATE — locks existing rows matching the predicate, serializing overlapping concurrent refreshes
  2. A single CTE that: evaluates the underlying query, upserts results into the MV, and deletes stale rows via anti-join

This avoids the consistency gap because FOR UPDATE locks are held until transaction commit. Non-overlapping refreshes still run in parallel since they lock different rows.

Concurrent Partial Refresh

When CONCURRENTLY is specified, the patch reuses the existing refresh_by_match_merge diff infrastructure but limits the scope of the diff (and temp table population) to rows matching the predicate. This requires EXCLUSIVE lock (same as full concurrent refresh).

Key Design Tensions

The CONCURRENTLY Semantics Inversion

A significant design issue identified by the author and reviewers: with a WHERE clause, the lock semantics of CONCURRENTLY become inverted from user expectations.

Mode MV Lock Concurrent Reads Concurrent Writes
Full Refresh ACCESS EXCLUSIVE Blocked Blocked
CONCURRENTLY (Full) EXCLUSIVE Allowed Blocked
WHERE (non-concurrent) ROW EXCLUSIVE Allowed Allowed
CONCURRENTLY WHERE EXCLUSIVE Allowed Blocked

Historically, CONCURRENTLY means "more permissive" — it allows readers. But with WHERE, the non-concurrent path is already more permissive (allows both readers and writers), making CONCURRENTLY the more restrictive option. This is unintuitive and unresolved in the thread.

The author notes that CONCURRENTLY has historically referred specifically to the diff-based algorithm, not just a lower lock level, suggesting leaving the implementations as-is despite the semantic oddity.

Unique Index Requirement

Both refresh modes require a unique index on the MV. For non-concurrent partial refresh, this is needed for:

This is a new requirement for non-concurrent refresh (full refresh doesn't need it), which may surprise users.

Volatile Function Restriction

The WHERE clause forbids volatile functions to ensure correctness — the predicate must be deterministic across the refresh operation. This is enforced in transformRefreshWhereClause().

Bug Found in Review

Dharin Shah identified a concrete bug: when building the ON CONFLICT target list, the code used indnatts (which includes INCLUDE columns) rather than indnkeyatts (key attributes only). For an index like CREATE UNIQUE INDEX ON mv(id) INCLUDE (extra), this generates ON CONFLICT (id, extra) which fails because INCLUDE columns aren't part of the uniqueness constraint. The fix is straightforward — use indnkeyatts for conflict target generation.

Performance Characteristics

The benchmark results demonstrate the O(delta) nature of partial refresh:

The critical insight: partial refresh performance is independent of total MV size, depending only on the number of affected rows and the efficiency of predicate push-down to base tables.

Open Questions and Unresolved Issues

  1. Deadlock potential: With FOR UPDATE + UPSERT, overlapping refreshes could potentially deadlock if they lock rows in different orders. No mitigation is discussed.
  2. Predicate push-down reliability: The feature's performance depends on the optimizer being able to push the WHERE condition down to base tables. Complex views with aggregation, CTEs, or window functions may not benefit.
  3. SQL standard deviation: This is explicitly a non-standard extension. The thread doesn't discuss whether alternative syntax (e.g., a separate command) would be more appropriate.
  4. Community interest signal: The author explicitly asked whether the community wants this feature in core. No committer has responded in the thread, which is notable for a feature of this scope.
  5. IVM interaction: PostgreSQL has ongoing IVM work. How this feature relates to or competes with full incremental view maintenance is not discussed.

Assessment

The patch addresses a real and widely-felt pain point. The technical approach is sound but complex, particularly around concurrency. The lack of committer engagement in the thread is concerning for a feature that touches grammar, parser, executor, locking, and SPI. The feature would benefit from a clear statement of intended guarantees and explicit documentation of limitations (unique index requirement, predicate push-down dependency, potential deadlocks).