Avoid calling SetMatViewPopulatedState if possible

First seen: 2026-04-10 03:36:43+00:00 · Messages: 6 · Participants: 3

Latest Update

2026-06-01 · claude-opus-4-6

Monthly Summary: Avoid calling SetMatViewPopulatedState if possible (May 2026)

Overview

A focused optimization patch to eliminate unnecessary pg_class catalog updates during REFRESH MATERIALIZED VIEW. The thread progressed from initial submission through a substantive technical challenge to a partial resolution, with the key debate centering on the actual scope of the optimization's benefit.

The Problem

SetMatViewPopulatedState() is called unconditionally at the end of ExecRefreshMatView() to mark pg_class.relispopulated = true. For already-populated materialized views (the overwhelmingly common case), this performs a semantically no-op heap_update that nonetheless produces MVCC bloat in pg_class — a catalog read by virtually every query during planning. The submitter demonstrated this via ctid churn on the matview's pg_class row across repeated REFRESH MATERIALIZED VIEW CONCURRENTLY calls.

The Proposed Fix

Guard the SetMatViewPopulatedState(matviewOid, true) call with a check of the current relispopulated flag. If already true, skip the catalog write entirely. This follows the established PostgreSQL pattern of "avoid no-op catalog updates" (as seen in ATExecChangeOwner and various ALTER paths).

Key Technical Debate

David Geier raised a fundamental objection: REFRESH MATERIALIZED VIEW already rewrites pg_class via finish_heap_swap() (updating relfilenode, relpages, reltuples, etc.), so skipping one additional write shouldn't eliminate ctid churn. This surfaced a critical distinction:

The submitter confirmed the CONCURRENTLY mechanism and asserted the optimization has value "in all cases" — technically correct since even in non-CONCURRENTLY mode it prevents generating a second dead pg_class tuple within the same transaction.

Current Status

The patch is in review. Geier concedes the principle ("Avoiding the bloat seems generally reasonable") but pushed for clearer explanation of why the demonstration worked. The submitter responded with the CONCURRENTLY mechanism explanation. No committer has weighed in. The ball is in Geier's court for either acceptance or further requests (e.g., non-CONCURRENTLY ctid demonstration, or a suggestion to unify both pg_class writes in the non-CONCURRENTLY path).

Open Questions

  1. Should the non-CONCURRENTLY path be refactored to fold relispopulated into the existing finish_heap_swap() pg_class write, achieving a single write per refresh?
  2. Whether two heap_update calls on the same tuple within one transaction can be coalesced by HOT (likely not).
  3. Should this be backpatched? (Likely no — optimization, not bug fix.)
History (1 prior analysis)
2026-06-01 · claude-opus-4-6

New Development: Third-Party Reviewer Validates Patch, Offers Style Suggestion

A new participant (44973863@qq.com, signing as "xman") enters the thread with a detailed review that largely confirms the submitter's position and adds independent testing verification. While mostly reinforcing existing conclusions, there are a few new contributions worth noting.

What's Actually New

  1. Independent test verification: The new participant ran the full PostgreSQL regression test suite with the patch applied and confirms it passes cleanly, including the matview-specific tests. This is the first explicit confirmation of test passage beyond the submitter's own claims.

  2. Style/readability suggestion for the guard condition: The reviewer suggests changing the condition from:

    if (RelationIsPopulated(matviewRel) == skipData)
    

    to:

    if (RelationIsPopulated(matviewRel) != !skipData)
    

    arguing the latter is more readable when paired with the existing call SetMatViewPopulatedState(matviewRel, !skipData). This is a minor bikeshed — both are logically equivalent (A == B vs A != !B when B is boolean are identical) — but it shows attention to the actual patch code.

  3. Design pattern endorsement: The reviewer notes that intorel_startup() in createas.c follows the same "caller checks before calling callee" pattern, providing precedent within the codebase for the patch's approach (checking at the call site rather than inside SetMatViewPopulatedState itself).

  4. Two implementation approaches articulated: The reviewer explicitly lays out the design choice between (a) making SetMatViewPopulatedState() internally skip no-op updates vs. (b) having the caller check first. They note the patch chose (b) and endorse it based on existing precedent.

What's NOT New (Just Restated)

The bulk of the message restates points already established:

  • SetMatViewPopulatedState() only modifies relispopulated (already clarified by cca5507)
  • Skipping it avoids dead tuples and WAL overhead (already established)
  • The optimization matters for BI/OLAP workloads with frequent refreshes (already noted)
  • Other pg_class columns are updated by separate routines (already established by cca5507's CONCURRENTLY explanation)

Review Dynamic

This is a supportive +1 with some added substance (test results, style suggestion, precedent citation). Geier has still not responded to cca5507's rebuttal, so the fundamental question of whether the reviewer accepts the explanation remains open. The thread now has two people endorsing the patch's approach.