Unify parallel worker handling for index builds and instrumentation

First seen: 2026-05-31 19:01:40+00:00 · Messages: 1 · Participants: 1

Latest Update

2026-06-01 · claude-opus-4-6

Unify Parallel Worker Handling for Index Builds and Instrumentation

Core Problem

PostgreSQL's parallel index build infrastructure has accumulated significant code duplication across different index access methods (B-tree, BRIN, and potentially others). Each index AM independently implements:

  1. Shared memory estimation and setup for parallel workers
  2. Instrumentation data passing (WAL usage, buffer usage statistics) between leader and workers
  3. Query text propagation to parallel worker processes
  4. Metadata feedback mechanisms from workers back to the leader

This duplication is not merely aesthetic — it creates real maintenance burden and makes it harder to add new cross-cutting concerns (like unified instrumentation) that need to interact with parallel worker infrastructure. When instrumentation needs change, every index AM's parallel build code must be updated independently, increasing the risk of inconsistencies and bugs.

Architectural Context

Parallel index builds in PostgreSQL work by having the leader process set up a ParallelContext with shared memory segments. Workers attach to this shared memory, perform their portion of the index build (typically sorting/inserting tuples), and report back statistics. Currently, each AM (primarily nbtsort.c for B-tree and brin.c for BRIN) has its own implementation of:

The instrumentation data (WAL usage, buffer usage) is particularly fragmented — these are tracked as separate fields despite always being used together in the parallel worker context.

Proposed Solution (4-Patch Series)

0001: Unified Instrumentation Struct

Introduces a new Instrumentation struct that combines WalUsage and BufferUsage into a single unit. This is a foundational change that simplifies all downstream code that currently passes these two structures around separately. This aligns with how the data is conceptually used — you never want WAL stats without buffer stats in an instrumentation context.

0002: Shared Memory Helpers for Instrumentation

Provides reusable functions to:

This eliminates the pattern where each AM manually calculates sizeof(BufferUsage) * nworkers + sizeof(WalUsage) * nworkers and manages the pointer arithmetic.

0003: Query Text Parallel Worker Helpers

Abstracts the common pattern of propagating the current query text to parallel workers. This is needed for pg_stat_activity visibility and potentially for instrumentation/logging in worker processes. Currently each parallel operation re-implements this propagation.

0004: Shared Parallel Index Build Helpers

The most architecturally significant and uncertain patch. Introduces common helper functions that abstract the parallel index build lifecycle:

Key Design Tensions

Scope of Abstraction

The author explicitly notes uncertainty about 0004's design. The fundamental question is: what's the right abstraction boundary?

The author leans toward the narrow (index-build-specific) approach because it allows more meaningful deduplication, suggesting the index build paths share substantially more structure than they do with VACUUM.

Relation to Stack-Based Instrumentation

This patch series emerged from work on a "stack-based instrumentation patch" (referenced but not detailed). The refactoring is motivated by making it feasible to add new instrumentation without touching every parallel code path independently. This suggests a larger ongoing effort to improve PostgreSQL's observability infrastructure.

Implications

If accepted, this refactoring would:

  1. Make it significantly easier to add new index AMs with parallel build support
  2. Reduce the cost of adding new instrumentation counters (single point of change)
  3. Potentially enable more consistent error handling and resource cleanup across parallel index operations
  4. Set a pattern for how future parallel maintenance operations should be structured

The risk is premature abstraction — if the helper functions don't cleanly accommodate the varying needs of different index AMs, the abstraction becomes a hindrance rather than a help.