Unify Parallel Worker Handling for Index Builds and Instrumentation
Core Problem
PostgreSQL's parallel index build infrastructure has accumulated significant code duplication across different index access methods (B-tree, BRIN, and potentially others). Each index AM independently implements:
- Shared memory estimation and setup for parallel workers
- Instrumentation data passing (WAL usage, buffer usage statistics) between leader and workers
- Query text propagation to parallel worker processes
- Metadata feedback mechanisms from workers back to the leader
This duplication is not merely aesthetic — it creates real maintenance burden and makes it harder to add new cross-cutting concerns (like unified instrumentation) that need to interact with parallel worker infrastructure. When instrumentation needs change, every index AM's parallel build code must be updated independently, increasing the risk of inconsistencies and bugs.
Architectural Context
Parallel index builds in PostgreSQL work by having the leader process set up a ParallelContext with shared memory segments. Workers attach to this shared memory, perform their portion of the index build (typically sorting/inserting tuples), and report back statistics. Currently, each AM (primarily nbtsort.c for B-tree and brin.c for BRIN) has its own implementation of:
_bt_begin_parallel()/ equivalent BRIN functions for DSM setup- Custom shared state structures with embedded
BufferUsageandWalUsagearrays - Worker-side attachment and statistics accumulation logic
- Leader-side aggregation of per-worker statistics
The instrumentation data (WAL usage, buffer usage) is particularly fragmented — these are tracked as separate fields despite always being used together in the parallel worker context.
Proposed Solution (4-Patch Series)
0001: Unified Instrumentation Struct
Introduces a new Instrumentation struct that combines WalUsage and BufferUsage into a single unit. This is a foundational change that simplifies all downstream code that currently passes these two structures around separately. This aligns with how the data is conceptually used — you never want WAL stats without buffer stats in an instrumentation context.
0002: Shared Memory Helpers for Instrumentation
Provides reusable functions to:
- Estimate shared memory size needed for per-worker instrumentation (
shm_toc_estimate_instrumentation()or similar) - Store/retrieve
Instrumentationstructs in DSM segments
This eliminates the pattern where each AM manually calculates sizeof(BufferUsage) * nworkers + sizeof(WalUsage) * nworkers and manages the pointer arithmetic.
0003: Query Text Parallel Worker Helpers
Abstracts the common pattern of propagating the current query text to parallel workers. This is needed for pg_stat_activity visibility and potentially for instrumentation/logging in worker processes. Currently each parallel operation re-implements this propagation.
0004: Shared Parallel Index Build Helpers
The most architecturally significant and uncertain patch. Introduces common helper functions that abstract the parallel index build lifecycle:
- Worker launch and shared memory setup common to all index AMs
- Standard patterns for worker statistics aggregation
- Potentially a callback-based design where AMs register their specific build logic
Key Design Tensions
Scope of Abstraction
The author explicitly notes uncertainty about 0004's design. The fundamental question is: what's the right abstraction boundary?
- Narrow approach: Only abstract what's common to parallel index builds (B-tree, BRIN, future AMs). This maximizes code savings for the specific case but doesn't help parallel VACUUM or other maintenance commands.
- Broad approach: Refactor parallel worker handling for all maintenance commands (VACUUM, index builds, potentially REINDEX). This is more architecturally pure but may force awkward abstractions where the operations don't actually share much beyond basic DSM setup.
The author leans toward the narrow (index-build-specific) approach because it allows more meaningful deduplication, suggesting the index build paths share substantially more structure than they do with VACUUM.
Relation to Stack-Based Instrumentation
This patch series emerged from work on a "stack-based instrumentation patch" (referenced but not detailed). The refactoring is motivated by making it feasible to add new instrumentation without touching every parallel code path independently. This suggests a larger ongoing effort to improve PostgreSQL's observability infrastructure.
Implications
If accepted, this refactoring would:
- Make it significantly easier to add new index AMs with parallel build support
- Reduce the cost of adding new instrumentation counters (single point of change)
- Potentially enable more consistent error handling and resource cleanup across parallel index operations
- Set a pattern for how future parallel maintenance operations should be structured
The risk is premature abstraction — if the helper functions don't cleanly accommodate the varying needs of different index AMs, the abstraction becomes a hindrance rather than a help.