Reporting the Currently-Vacuumed Index in pg_stat_progress_vacuum
Core Problem
pg_stat_progress_vacuum exposes coarse counters (indexes_total, indexes_processed, heap_blks_scanned, etc.) but gives no visibility into which index is being worked on during the vacuuming indexes and cleaning up indexes phases. On tables mixing heterogeneous index AMs (btree, GIN, GiST, BRIN, and increasingly vector indexes like HNSW), the per-AM bulkdelete/vacuumcleanup cost varies by orders of magnitude. When an autovacuum worker appears "stuck", operators currently have no in-database way to attribute the stall to a specific index — they must resort to pstack, perf, or inference from locks. This is a recurring field pain point, particularly since the introduction of parallel index vacuum (PG 13) and the proliferation of expensive AMs.
Bharath's Proposed Design
The patch adds a current_index_relid column to pg_stat_progress_vacuum, populated via the existing pgstat_progress_update_param machinery at the points where vacuumparallel.c / vacuumlazy.c dispatch an index. Critically, the design chooses one progress row per process in the parallel case:
- The leader emits its row with
current_index_relidset to whatever it is personally processing andleader_pid = NULL. - Each parallel vacuum worker emits its own row, with
leader_pidpointing back at the leader.
To make workers able to report leader_pid, the patch introduces a global accessor GetParallelLeaderPid() returning the static ParallelLeaderPid in the worker backend.
Design Tensions Surfaced in Review
Three distinct critiques emerge, each pointing at a different architectural layer.
1. Is per-worker rows the right shape? (Sami Imseih)
Sami argues that the pg_stat_progress_* family has an implicit contract: one row = one command with meaningful progress counters. A parallel worker row would be almost entirely NULL/redundant — it carries no heap_blks_scanned, no indexes_processed, only current_index_relid + leader_pid. That is status, not progress, and blurs the view's semantic model.
His counter-proposal is to aggregate worker state into the leader's row using arrays (worker_pids int[], current_index_relids oid[]). This is consistent with how pg_stat_progress_copy and similar views behave (single row per command) and avoids polluting the view when users run queries like SELECT count(*) FROM pg_stat_progress_vacuum.
The tradeoff: arrays are awkward to join against pg_class (requires unnest), whereas separate rows compose naturally with LEFT JOIN pg_class ic ON ic.oid = v.current_index_relid. Bharath's original example query demonstrates this ergonomic advantage. However, precedent in the progress-reporting infrastructure leans toward Sami's view — and the view's documentation explicitly frames rows as "commands in progress."
2. Stale values across phase transitions (Satya Narlapuram)
Satya identifies a concrete correctness bug: current_index_relid is set when a worker picks up an index but never cleared when the phase transitions to vacuuming heap, truncating heap, or even cleaning up indexes. The result is that the view reports a stale index OID during phases where no index is being processed — e.g.:
phase=vacuuming heap | current_index_relid=16392 (t1_pkey)
This is actively misleading and must be fixed by calling pgstat_progress_update_param(PROGRESS_VACUUM_CURRENT_INDEX_RELID, InvalidOid) at each phase exit (or better, at each phase entry so the invariant is "valid only during index phases"). Satya's other two points are minor: he questions whether a global GetParallelLeaderPid() is warranted when the leader PID could simply be stored in PVShared (the DSM segment already used by parallel vacuum to share state), and he notes leader_pid should probably be typed as integer to match pg_stat_activity.leader_pid.
The PVShared suggestion is the more interesting one architecturally — it avoids introducing a new global API surface (ParallelLeaderPid accessor) purely for progress reporting, keeping the leader-PID knowledge localized to the parallel vacuum machinery that already needs it. A global accessor invites misuse elsewhere.
3. Nested progress reporting (Antonin Houska)
Antonin raises a more fundamental architectural concern from his ongoing REPACK work: the progress-reporting slots in PgBackendStatus are flat. A single backend can only report one "command" at a time, so when REPACK internally triggers an index build, the pg_stat_progress_create_index updates clobber the pg_stat_progress_repack counters (or vice versa). The same structural issue applies here: vacuum "containing" per-index work is conceptually nested progress.
His proposal is a general mechanism for sub-command progress tracking — allowing a command to push a nested progress context and pop it when done. If adopted, this would subsume Bharath's problem: each index vacuum becomes a sub-progress entry with its own relid, and parallel workers would naturally have their own sub-progress frames without cluttering the vacuum view.
This is a much larger change and would likely block or reshape the narrower current_index_relid patch. Whether the community wants to wait for the generalized mechanism (which is tied to the not-yet-committed REPACK series) or ship the targeted fix now is the key strategic question the thread leaves open.
Implications for the Patch
The minimum viable patch must:
- Clear
current_index_relidon phase exit (Satya's bug). - Decide between per-worker rows vs. aggregated arrays (Sami's concern) — this is the load-bearing design decision.
- Move leader-PID plumbing into
PVSharedrather than a new global accessor (cleaner layering). - Align
leader_pidtype withpg_stat_activity.
The open architectural question is whether to accept this as a point solution or fold it into Antonin's nested-progress framework. Given that REPACK itself is still in flux, a pragmatic path is to commit the targeted fix now (with Sami's aggregated-row shape) and refactor onto nested progress later when that infrastructure lands.