Technical Analysis: Add WHERE Clause Support to REFRESH MATERIALIZED VIEW
The Core Problem
PostgreSQL's materialized views (MVs) have a fundamental scalability limitation: the only way to update them is a full refresh, which re-executes the entire underlying query and replaces all rows. For large MVs where only a small subset of data changes frequently, this is an O(N) operation where O(delta) would suffice. This forces users into one of two workarounds:
- Manual table maintenance: Replace the MV with a regular table and maintain it via complex trigger logic — essentially hand-rolling incremental view maintenance (IVM).
- Separate high-churn columns: Split frequently-updated data into separate structures joined at query time.
Both approaches sacrifice the declarative simplicity that makes materialized views attractive. The patch proposes allowing REFRESH MATERIALIZED VIEW mv WHERE <predicate> to restrict the refresh scope to a subset of the view's output rows, enabling O(delta) refreshes with minimal user complexity.
Architectural Context
This feature sits at the intersection of several PostgreSQL subsystems:
- Grammar/Parser: New optional WHERE clause in the REFRESH statement
- Rewrite/Optimizer: Predicate push-down from the MV output columns into the underlying base table queries
- Executor/SPI: The refresh logic uses SPI to inject predicates into source queries during execution
- Locking/Concurrency: The most architecturally complex aspect — managing consistency without ACCESS EXCLUSIVE locks
- Catalog/Index: Dependency on unique indexes for conflict resolution
Proposed Solutions and Their Evolution
v1: DELETE → INSERT with ROW EXCLUSIVE Lock
The initial implementation used a "Prune + Upsert" strategy:
- DELETE all MV rows matching the WHERE clause
- INSERT new data from the source query with ON CONFLICT DO UPDATE
This used ROW EXCLUSIVE locks (rather than ACCESS EXCLUSIVE), allowing concurrent reads and writes. However, the author discovered a consistency gap: once rows are DELETEd, their physical tuple locks are released. A concurrent transaction can insert a conflicting logical row during this window, leaving the MV inconsistent.
v2 (Proposed): Transaction-Level Advisory Locks
The author proposed using pg_advisory_xact_lock(mv_oid, hashtext(ROW(key_cols)::text)) to lock the logical identity of rows rather than physical tuples. This would serialize concurrent refreshes on the same logical rows while allowing non-overlapping refreshes to run in parallel.
Abandoned because it hit max_locks_per_transaction limits at scale — each row being refreshed would consume a lock slot, making bulk operations impractical.
v3 (Current): FOR UPDATE + Single CTE
The final approach:
SELECT FROM mv WHERE ... FOR UPDATE— locks existing rows matching the predicate, serializing overlapping concurrent refreshes- A single CTE that: evaluates the underlying query, upserts results into the MV, and deletes stale rows via anti-join
This avoids the consistency gap because FOR UPDATE locks are held until transaction commit. Non-overlapping refreshes still run in parallel since they lock different rows.
Concurrent Partial Refresh
When CONCURRENTLY is specified, the patch reuses the existing refresh_by_match_merge diff infrastructure but limits the scope of the diff (and temp table population) to rows matching the predicate. This requires EXCLUSIVE lock (same as full concurrent refresh).
Key Design Tensions
The CONCURRENTLY Semantics Inversion
A significant design issue identified by the author and reviewers: with a WHERE clause, the lock semantics of CONCURRENTLY become inverted from user expectations.
| Mode | MV Lock | Concurrent Reads | Concurrent Writes |
|---|---|---|---|
| Full Refresh | ACCESS EXCLUSIVE | Blocked | Blocked |
| CONCURRENTLY (Full) | EXCLUSIVE | Allowed | Blocked |
| WHERE (non-concurrent) | ROW EXCLUSIVE | Allowed | Allowed |
| CONCURRENTLY WHERE | EXCLUSIVE | Allowed | Blocked |
Historically, CONCURRENTLY means "more permissive" — it allows readers. But with WHERE, the non-concurrent path is already more permissive (allows both readers and writers), making CONCURRENTLY the more restrictive option. This is unintuitive and unresolved in the thread.
The author notes that CONCURRENTLY has historically referred specifically to the diff-based algorithm, not just a lower lock level, suggesting leaving the implementations as-is despite the semantic oddity.
Unique Index Requirement
Both refresh modes require a unique index on the MV. For non-concurrent partial refresh, this is needed for:
- The ON CONFLICT target in UPSERT operations
- Identifying which rows to lock with FOR UPDATE
- The anti-join to detect rows that should be deleted (no longer in source data)
This is a new requirement for non-concurrent refresh (full refresh doesn't need it), which may surprise users.
Volatile Function Restriction
The WHERE clause forbids volatile functions to ensure correctness — the predicate must be deterministic across the refresh operation. This is enforced in transformRefreshWhereClause().
Bug Found in Review
Dharin Shah identified a concrete bug: when building the ON CONFLICT target list, the code used indnatts (which includes INCLUDE columns) rather than indnkeyatts (key attributes only). For an index like CREATE UNIQUE INDEX ON mv(id) INCLUDE (extra), this generates ON CONFLICT (id, extra) which fails because INCLUDE columns aren't part of the uniqueness constraint. The fix is straightforward — use indnkeyatts for conflict target generation.
Performance Characteristics
The benchmark results demonstrate the O(delta) nature of partial refresh:
- At single-row granularity (batch=1), partial refresh achieves ~43-45% of baseline TPS — significant overhead per-operation but constant regardless of MV size
- At batch=50 with 4+ clients, partial refresh matches or exceeds baseline due to reduced lock contention vs. full refresh
- At batch=1000, full refresh becomes competitive for small datasets but partial refresh maintains advantage at scale (400k rows)
- Partial refresh performs comparably to hand-maintained trigger tables, validating the design goal
The critical insight: partial refresh performance is independent of total MV size, depending only on the number of affected rows and the efficiency of predicate push-down to base tables.
Open Questions and Unresolved Issues
- Deadlock potential: With FOR UPDATE + UPSERT, overlapping refreshes could potentially deadlock if they lock rows in different orders. No mitigation is discussed.
- Predicate push-down reliability: The feature's performance depends on the optimizer being able to push the WHERE condition down to base tables. Complex views with aggregation, CTEs, or window functions may not benefit.
- SQL standard deviation: This is explicitly a non-standard extension. The thread doesn't discuss whether alternative syntax (e.g., a separate command) would be more appropriate.
- Community interest signal: The author explicitly asked whether the community wants this feature in core. No committer has responded in the thread, which is notable for a feature of this scope.
- IVM interaction: PostgreSQL has ongoing IVM work. How this feature relates to or competes with full incremental view maintenance is not discussed.
Assessment
The patch addresses a real and widely-felt pain point. The technical approach is sound but complex, particularly around concurrency. The lack of committer engagement in the thread is concerning for a feature that touches grammar, parser, executor, locking, and SPI. The feature would benefit from a clear statement of intended guarantees and explicit documentation of limitations (unique index requirement, predicate push-down dependency, potential deadlocks).