RFC: Allow EXPLAIN to Output Page Fault Information

First seen: 2024-12-24 08:53:06+00:00 · Messages: 48 · Participants: 11

Latest Update

2026-06-04 · claude-opus-4-6

Incremental Update: 2026-06-01

Author Response to Review and Viability Concerns

Atsushi Torikoshi responded to both the commitfest review bug report (from Ilmar Yunusov) and the strategic viability challenge (from Lukas Fittl). No new patch version was posted.

Key Points

1. Acknowledges the structured output bug — will fix: Torikoshi confirms he will fix the spurious "Execution Storage I/O" section in non-text formats when ANALYZE is not specified. He notes this may become moot because the plan is to move the feature to the new IO option, which (unlike BUFFERS) would require ANALYZE to be specified, eliminating the without-ANALYZE code path entirely.

2. Defends the feature's value despite worker-based AIO limitations: Torikoshi explicitly concedes that many users will use io_method=worker but argues the feature still has value for:

Users on sync I/O or io_uring
High-performance on-premises environments where io_uring is the preferred option for performance reasons
The argument that io_uring will likely become the long-term preferred option for performance-focused deployments

This is a deliberate narrowing of the target audience — from "all PostgreSQL users" to "performance-focused users on io_uring/sync."

3. No progress on worker-compatible alternative: Torikoshi admits he does not currently have a good idea for making this work with I/O workers, and references his earlier experiment showing per-node getrusage() calls had prohibitive overhead — implying per-I/O calls within workers would be even worse.

Assessment

This is primarily an acknowledgment message with no new code or technical proposals. The author is holding position on the feature's value proposition while accepting its scope limitation. The patch remains in Waiting on Author status with no new version posted.

History (2 prior analyses)

2026-06-01 · claude-opus-4-6

Monthly Summary: RFC: Allow EXPLAIN to Output Page Fault / Storage I/O Information — May 2026

Overview

This month saw the discussion mature from technical refinement into a serious viability debate. While incremental progress was made on integration details (moving the feature to PG19's new IO option), the fundamental question of whether the approach has a future given worker-based AIO dominance emerged as the central concern.

Background & Technical Approach

The patch exposes per-query physical storage I/O metrics through EXPLAIN, using getrusage(2)'s ru_inblock/ru_oublock counters. This fills a diagnostic gap: EXPLAIN (ANALYZE, BUFFERS) shows buffer cache hits vs. misses but cannot distinguish OS page cache reads (fast) from actual disk I/O (slow). The approach evolved from an earlier page-fault-based design that proved unreliable.

Key implementation characteristics:

Two getrusage() calls bracket each phase (planning, execution)
Parallel workers transmit accumulated I/O to the leader via existing communication channels
Reports aggregate per-phase totals (not per-node, due to overhead)
Works on Linux, FreeBSD, macOS, and most Unix variants; not on Windows
Suppresses output when io_method=worker because worker-dispatched I/O isn't attributed to the submitting backend

Key Developments This Month

1. Integration Redirect: BUFFERS → IO Option

Jelte Fennema-Nio proposed moving Storage I/O tracking from the BUFFERS option to PG19's new dedicated IO option for EXPLAIN. This provides cleaner semantic separation — BUFFERS handles shared buffer hit/miss accounting while IO handles actual I/O operations. Atsushi Torikoshi agreed but flagged a scope mismatch: the IO option documents itself as reporting I/O from scan nodes, while getrusage()-based tracking captures all backend I/O including temporary file spills from sorts and hash joins.

2. Viability Crisis: Worker-Based AIO Dominance

Lukas Fittl raised the most strategically significant challenge of the month, providing concrete deployment evidence that undermines the patch's practical value:

AWS RDS/Aurora (managed providers with large market share) only offer I/O workers
Container environments commonly have io_uring disabled at kernel level
The only path where the patch works (io_uring) will be unavailable to a majority of PG19+ users

This transforms earlier theoretical concerns from Andres Freund into a concrete deployment reality. Lukas asked whether per-I/O getrusage() calls within worker processes might be feasible, while acknowledging the likely performance cost.

3. Patch Maintenance

Atsushi Torikoshi updated the patch to clarify that reported values include parallel worker contributions, and addressed prior review feedback on AIO-awareness documentation.

Unresolved Tensions

Issue	Status
io_uring not universally available	Unresolved; RHEL kernel restrictions, liburing dependency, ulimit tuning needed
Worker-based AIO incompatibility	Fundamental blocker for majority of deployments
Per-node vs. per-query granularity	Settled as per-query only (overhead too high for per-node)
IO option scope mismatch	Needs documentation clarification
Alternative measurement mechanisms	Unexplored; Lukas's suggestion to investigate worker-compatible approaches

Current Status

The patch is technically functional but faces an existential question: with worker-based AIO as the default and io_uring unavailable in most managed/containerized deployments, the feature would be invisible to the majority of users. The discussion has shifted from "how to integrate" to "should we pursue this approach at all" or whether an entirely different mechanism is needed. No committer has stepped forward, and the patch remains in "needs review" state.

2026-06-01 · claude-opus-4-6

Incremental Update: 2026-05-30

New Review Identifies a Bug in Structured Output

Ilmar Yunusov submitted a formal commitfest review of v11, marking it "Implements feature: tested, failed" and moving the patch status to Waiting on Author.

Key Technical Finding: Spurious Execution Section Without ANALYZE

The most significant finding is a bug in structured EXPLAIN output formats (JSON, XML, YAML). When BUFFERS is enabled without ANALYZE, the patch emits an "Execution Storage I/O" section with zeroed counters:

"Execution": {
  "Storage I/O Read": 0,
  "Storage I/O Write": 0
}

This is incorrect because:

Without ANALYZE, the query is not actually executed — so no execution-phase I/O exists to report
The existing BUFFERS documentation explicitly states that without ANALYZE, only planning-phase buffer usage is reported
Text format already behaves correctly (no Execution section without ANALYZE)

The root cause is identified: ExplainOnePlan() checks es->buffers but not es->analyze before emitting execution-phase Storage I/O, and peek_storageio_usage() returns true for non-text formats even when both counters are zero.

Fix needed: Gate execution Storage I/O output on es->analyze, consistent with how Execution Time is handled.

Minor Issues

Three trailing whitespace warnings in the TAP test file (011_explain_storage_io.pl lines 47, 55, 61)

Patch Status Change

The patch moved from "Needs Review" to "Waiting on Author" — requiring Atsushi Torikoshi to address the structured output bug and whitespace issues before further review progress.