Monthly Summary: EXPLAIN ANALYZE Wait Event Reporting (May 2026)
Overview
This RFC proposes adding a new EXPLAIN (ANALYZE, WAITS) option that transforms PostgreSQL's wait event subsystem from a point-in-time state indicator into a time-integrated profile. The feature would produce per-statement and per-plan-node wait event breakdowns, answering the DBA question: "During this query, where did time go?"
The thread progressed through three patch versions (v0→v1→v2) during May but received zero technical reviews. The only external interaction was a procedural correction from Michael Paquier about mailing list conventions.
Design Summary
The patch series (7 patches) hooks into the existing pgstat_report_wait_start()/pgstat_report_wait_end() call sites with a single unlikely() branch on a global boolean. When EXPLAIN (ANALYZE, WAITS) is active, wait durations are accumulated into preallocated fixed-size arrays (64 entries per accumulator) with an overflow bucket for queries exceeding 64 distinct wait event types.
Key architectural decisions:
- Allocation-free wait-end path: The accumulator never allocates memory at wait-end time (critical for paths inside WAL flushes, LWLock releases, etc.)
- Sorted arrays with binary search: O(log n) lookup per wait-end, enabling efficient merge-sort aggregation from parallel workers
- Inclusive per-node attribution: Mirrors existing
EXPLAIN ANALYZEtiming semantics — a wait is charged to all active nodes in the plan stack - Parallel worker aggregation: DSA-backed per-worker accumulation with proper merge-on-rescan semantics
- Opaque API:
WaitEventUsageinternals hidden behind accessor functions
Patch Evolution
v0 → v1 (Process fix)
No code changes. Consolidated seven separate patch emails into a single email per pgsql-hackers conventions, per Michael Paquier's guidance.
v1 → v2 (Test stability)
No accounting-code changes. Fixed regression test failures on FreeBSD CFBot:
- Statement-level tests now assert presence of expected waits rather than exact match (infrastructure waits like IPC/DSM are non-deterministic)
- JSON tests use JSONPath for flexible matching
- Serial tests explicitly disable
debug_parallel_queryto eliminate spurious parallel waits - Parallel rescan accumulation test removed as too fragile for the parallel regression harness
Open Review Questions (Unanswered)
- Naming:
WAITS(terse, matchesBUFFERS/WAL/IO) vsWAIT_EVENTS(more discoverable, matchespg_stat_activityterminology) - Inclusive vs exclusive attribution: Should node-level waits be summable (exclusive) or mirror existing timing semantics (inclusive)?
- Fixed accumulator limit: Is 64 entries the right bound? Is overflow-bucket semantics acceptable?
- Hot-path overhead: Is ~0.1–0.2 ns per disabled wait acceptable? Community will demand Linux-pinned benchmarks.
Status
The patch remains at RFC stage with no technical engagement from committers or reviewers. The design is mature and follows established patterns from prior instrumentation features (BUFFERS, WAL, IO), but has not yet attracted review attention.