Avoid calling SetMatViewPopulatedState if possible

2026-05-06 · opus 4.7

Overview

This is a small, focused optimization patch for REFRESH MATERIALIZED VIEW that targets an unnecessary catalog update performed on every refresh. The thread is short (three messages) but surfaces a non-obvious correctness/bloat concern that elevates it above a pure micro-optimization.

The Core Problem

SetMatViewPopulatedState() is called unconditionally at the end of ExecRefreshMatView() to mark the matview's pg_class.relispopulated flag as true. The observation from the submitter (cca5507) is that when a matview is already populated — which is the overwhelmingly common case for REFRESH MATERIALIZED VIEW (you can only refresh something that exists and is typically already populated after the first refresh) — this call performs a catalog update that changes nothing semantically.

The architectural cost isn't CPU cycles; it's MVCC bloat in pg_class. Every call to SetMatViewPopulatedState() performs a heap_update() on the matview's pg_class row, which under MVCC means:

The old tuple is marked dead (xmax set).
A new, logically identical tuple is inserted.
Any indexes on pg_class get new entries (HOT may or may not apply depending on indexed columns and page fill).
Subsequent VACUUM/autovacuum work is required to reclaim the dead tuple.

The submitter demonstrates this concretely: the ctid of the pg_class row for matview m changes on every REFRESH MATERIALIZED VIEW CONCURRENTLY m, proving that a new heap tuple version is produced each time. On workloads that refresh matviews frequently (common for incremental-style analytics pipelines that refresh every few minutes), this produces a steady stream of pg_class bloat that autovacuum has to clean up — and pg_class is a catalog everyone hits, so its bloat has outsized performance implications.

The Proposed Fix

The patch (not quoted in full, but inferable from the description) wraps the SetMatViewPopulatedState(matviewOid, true) call with a guard that checks the current relispopulated flag on the matview's pg_class tuple. If it is already true, the call is skipped. Only when refreshing a matview created with WITH NO DATA (where relispopulated = false) does the update actually fire.

This is a classic "avoid no-op catalog updates" pattern that PostgreSQL has applied elsewhere (e.g., ATExecChangeOwner skips when the owner is unchanged, and various ALTER paths check before updating).

Key Technical Considerations

Why the performance delta is invisible

The submitter correctly notes that benchmarking won't show much. A single heap_update on one pg_class row is dwarfed by the actual refresh work (scanning base tables, rebuilding the matview heap, possibly rebuilding indexes, or — for CONCURRENTLY — diffing and applying deltas). Geibel's (geidav.pg) request for performance data is a reasonable reviewer reflex, but in this case the justification is correctness-adjacent (avoiding bloat) rather than latency.

Why pg_class bloat matters disproportionately

pg_class is read by virtually every query during planning (relation lookups, relcache building). Bloat here degrades:

Seq scan costs inside syscache misses.
Index scan cost on pg_class_oid_index and pg_class_relname_nsp_index.
Autovacuum pressure on a shared catalog, which takes ShareUpdateExclusiveLock and can conflict with DDL.

CONCURRENTLY mode makes this more important

REFRESH MATERIALIZED VIEW CONCURRENTLY is explicitly designed for frequent refreshes of populated matviews — it diffs and applies changes rather than truncate-and-reload. So the case where the flag is already true is essentially all of the CONCURRENTLY workload. The patch aligns the code with the intended usage pattern.

Race conditions / locking

ExecRefreshMatView already holds AccessExclusiveLock (for non-CONCURRENTLY) or ExclusiveLock (for CONCURRENTLY) on the matview, so reading relispopulated and deciding to skip the update is safe — no one else can flip the flag under us.

Design Tradeoffs

Code complexity vs. benefit: Trivial added complexity (one extra catalog read or syscache lookup) in exchange for eliminating per-refresh pg_class churn. The syscache lookup is essentially free since the relcache entry is already in memory during the refresh.
Read-before-write vs. unconditional write: Generally PostgreSQL prefers unconditional writes when the write is cheap and the read adds a cycle, but for catalog updates the MVCC tail makes the read-first approach strictly better.

Participant Analysis

cca5507 (author): Submitter with a narrow, well-scoped patch. Their follow-up reply is the substantive one — demonstrating ctid churn is a much stronger argument than the original "small optimization" framing.
geidav.pg (David Geier): A reviewer who (correctly) asked for motivation. This is the standard hackers gate for micro-optimizations: "show me the numbers or show me the use case." The author's response satisfies the second prong.

No committer has weighed in in the three messages shown, and no design disagreement has surfaced. The patch is the kind of thing that typically gets committed after one more round of review confirming the guard logic is correct and there's no subtle case (e.g., recovery from a failed refresh) where forcing the flag write is desirable.

Open Questions Not Addressed in the Thread

Should the same guard be added symmetrically when creating a matview WITH NO DATA and then later populating it? (Probably not an issue since that path flips false→true.)
Does REFRESH ... WITH NO DATA on an already-unpopulated matview have the same redundant write? Worth checking in the patch.
Should this be backpatched? Likely no — it's a minor optimization, not a bug fix, and backpatching catalog-touching logic changes is generally avoided.