2026-06-04 · claude-opus-4-6

Incremental Update: Review Feedback and Series Reordering Discussion

Summary

The thread received substantive review feedback from two reviewers (Sami Imseih and Michael Paquier), leading the author (Lukas Fittl) to consider reordering the patch series and deferring patch 0001 (unified Instrumentation struct).

Key Technical Debate: Memory Overhead of Unified Instrumentation Struct

The central new technical disagreement is about patch 0001's approach of replacing separate BufferUsage + WalUsage with a larger Instrumentation struct in DSM shared memory:

Michael Paquier's concern: Instrumentation is larger than WalUsage + BufferUsage combined, and contains fields (like _start fields) that workers have no use for. This increases DSM footprint unnecessarily and is conceptually confusing — why push fields to workers that they'll never use?
Sami Imseih's counter-argument: The actual overhead is 192 bytes per worker (360 bytes for Instrumentation vs 168 bytes for BufferUsage + WalUsage). Even at 32 workers, that's only ~6 KB — negligible compared to the code cleanup benefits.
Lukas Fittl's resolution: Acknowledges both points. Notes that the overhead would be much smaller once/if the stack-based instrumentation approach lands (which would eliminate the _start fields). Proposes reordering the series to lead with the less controversial patches (0003 query text helpers, 0004 shared parallel index build helpers) and deferring 0001/0002 until the stack-based instrumentation design is settled.

Specific Review Comments on Individual Patches

0003 (Query Text Helpers):

Sami initially went back-and-forth on RestoreParallelQueryText() combining debug_query_string assignment with pgstat_report_activity(), but concluded it's acceptable for deduplication purposes.
Comment accuracy issue: "no-op (leaving debug_query_string NULL)" is inaccurate — it does set debug_query_string to NULL (not truly a no-op).
Michael sees "nice advantages" in this patch — less duplicated logic across three parallel worker callbacks.

0004 (Shared Parallel Index Build Helpers):

Sami notes "lossy AMs" comment is inaccurate — GIN doesn't use havedead/brokenhotchain fields but isn't lossy.
Typo: extra space before "launched" in comment.
Important structural feedback: The GIN queryid fix mentioned in 0004's commit message should be split into its own independent patch and placed first in the series.
Michael praises the lock-level and snapshot deduplication as "quite beneficial in the long-term."
Both reviewers agree that extending deduplication to parallel VACUUM is unnecessary — the operations are too different.

Emerging Consensus

Patches 0003 and 0004 have strong support from both reviewers
Patches 0001 and 0002 are controversial due to the memory overhead / unused fields concern
The GIN queryid bugfix should be separated out
The series will likely be reordered with 0003/0004 leading

2026-06-01 · claude-opus-4-6

Unify Parallel Worker Handling for Index Builds and Instrumentation

Core Problem

PostgreSQL's parallel index build infrastructure has accumulated significant code duplication across different index access methods (B-tree, BRIN, and potentially others). Each index AM independently implements:

Shared memory estimation and setup for parallel workers
Instrumentation data passing (WAL usage, buffer usage statistics) between leader and workers
Query text propagation to parallel worker processes
Metadata feedback mechanisms from workers back to the leader

This duplication is not merely aesthetic — it creates real maintenance burden and makes it harder to add new cross-cutting concerns (like unified instrumentation) that need to interact with parallel worker infrastructure. When instrumentation needs change, every index AM's parallel build code must be updated independently, increasing the risk of inconsistencies and bugs.

Architectural Context

Parallel index builds in PostgreSQL work by having the leader process set up a ParallelContext with shared memory segments. Workers attach to this shared memory, perform their portion of the index build (typically sorting/inserting tuples), and report back statistics. Currently, each AM (primarily nbtsort.c for B-tree and brin.c for BRIN) has its own implementation of:

_bt_begin_parallel() / equivalent BRIN functions for DSM setup
Custom shared state structures with embedded BufferUsage and WalUsage arrays
Worker-side attachment and statistics accumulation logic
Leader-side aggregation of per-worker statistics

The instrumentation data (WAL usage, buffer usage) is particularly fragmented — these are tracked as separate fields despite always being used together in the parallel worker context.

Proposed Solution (4-Patch Series)

0001: Unified Instrumentation Struct

Introduces a new Instrumentation struct that combines WalUsage and BufferUsage into a single unit. This is a foundational change that simplifies all downstream code that currently passes these two structures around separately. This aligns with how the data is conceptually used — you never want WAL stats without buffer stats in an instrumentation context.

0002: Shared Memory Helpers for Instrumentation

Provides reusable functions to:

Estimate shared memory size needed for per-worker instrumentation (shm_toc_estimate_instrumentation() or similar)
Store/retrieve Instrumentation structs in DSM segments

This eliminates the pattern where each AM manually calculates sizeof(BufferUsage) * nworkers + sizeof(WalUsage) * nworkers and manages the pointer arithmetic.

0003: Query Text Parallel Worker Helpers

Abstracts the common pattern of propagating the current query text to parallel workers. This is needed for pg_stat_activity visibility and potentially for instrumentation/logging in worker processes. Currently each parallel operation re-implements this propagation.

0004: Shared Parallel Index Build Helpers

The most architecturally significant and uncertain patch. Introduces common helper functions that abstract the parallel index build lifecycle:

Worker launch and shared memory setup common to all index AMs
Standard patterns for worker statistics aggregation
Potentially a callback-based design where AMs register their specific build logic

Key Design Tensions

Scope of Abstraction

The author explicitly notes uncertainty about 0004's design. The fundamental question is: what's the right abstraction boundary?

Narrow approach: Only abstract what's common to parallel index builds (B-tree, BRIN, future AMs). This maximizes code savings for the specific case but doesn't help parallel VACUUM or other maintenance commands.
Broad approach: Refactor parallel worker handling for all maintenance commands (VACUUM, index builds, potentially REINDEX). This is more architecturally pure but may force awkward abstractions where the operations don't actually share much beyond basic DSM setup.

The author leans toward the narrow (index-build-specific) approach because it allows more meaningful deduplication, suggesting the index build paths share substantially more structure than they do with VACUUM.

Relation to Stack-Based Instrumentation

This patch series emerged from work on a "stack-based instrumentation patch" (referenced but not detailed). The refactoring is motivated by making it feasible to add new instrumentation without touching every parallel code path independently. This suggests a larger ongoing effort to improve PostgreSQL's observability infrastructure.

Implications

If accepted, this refactoring would:

Make it significantly easier to add new index AMs with parallel build support
Reduce the cost of adding new instrumentation counters (single point of change)
Potentially enable more consistent error handling and resource cleanup across parallel index operations
Set a pattern for how future parallel maintenance operations should be structured

The risk is premature abstraction — if the helper functions don't cleanly accommodate the varying needs of different index AMs, the abstraction becomes a hindrance rather than a help.

Unify parallel worker handling for index builds and instrumentation

Latest Update