Technical Analysis: Use Streaming Read I/O When Enabling Data Checksums Online
Core Problem
PostgreSQL's online checksum enablement feature (which allows enabling data checksums on a running cluster without downtime) currently uses ReadBufferExtended() to read relation pages one at a time. This is a sequential, synchronous I/O pattern that does not take advantage of the streaming read infrastructure that has been developed in recent PostgreSQL versions.
The streaming read API (read streams) allows PostgreSQL to issue prefetch/read-ahead requests, enabling the kernel and storage layer to optimize I/O by batching reads, reducing latency through asynchronous operations, and improving throughput for sequential scan patterns. Since enabling checksums online requires reading every page of every relation in the cluster, this is an inherently sequential-scan-heavy workload that is ideally suited for streaming reads.
Architectural Context
ReadBufferExtended() vs. Streaming Reads
ReadBufferExtended() is the traditional single-buffer read interface in PostgreSQL. It reads one buffer at a time from shared buffers (or from disk if not cached). For bulk sequential operations, this creates a pattern of:
- Request block N
- Wait for I/O completion
- Process block N
- Request block N+1
- Repeat
The streaming read infrastructure (introduced and expanded in v17/v18/v19 timeframe) wraps this pattern with a higher-level API that:
- Maintains a window of upcoming read requests
- Issues prefetch (posix_fadvise or AIO) calls ahead of consumption
- Allows the OS to coalesce and pipeline I/O requests
- Significantly reduces wall-clock time for sequential scan workloads
Online Checksum Enablement
Enabling checksums online is a background operation that must visit every data page in the cluster to compute and write checksums. This is fundamentally a full-cluster sequential scan. The operation was developed when streaming reads did not yet exist, so it naturally used the only available interface: ReadBufferExtended().
Proposed Solution
The patch (described as "simple") replaces ReadBufferExtended() calls in the online checksum enablement code path with the streaming read API. This is a straightforward modernization — the same pattern that has been applied across other sequential scan code paths in PostgreSQL (e.g., VACUUM, sequential scans in the executor, ANALYZE, etc.).
Key Technical Considerations
-
Performance Impact: For large databases, enabling checksums online can be a lengthy operation. Streaming reads could significantly reduce the wall-clock time by improving I/O throughput, particularly on rotational media or high-latency storage.
-
Rate Limiting Interaction: The online checksum process likely has rate-limiting/throttling to avoid impacting production workloads. The interaction between streaming read prefetch and throttling needs consideration — prefetching aggressively while throttling processing could waste buffer pool space.
-
Buffer Pool Pressure: The online checksum operation uses a bulk-read strategy (likely with
BAS_BULKREADaccess strategy) to avoid evicting hot pages. The streaming read API needs to respect this same strategy to avoid regressing buffer pool behavior. -
Correctness vs. Performance Focus: As noted by Daniel Gustafsson, recent development work on this feature focused on correctness of operation. The I/O pattern modernization is a performance optimization that was deferred to avoid destabilizing the correctness work.
Timing and Process
The patch was submitted during the v19 feature freeze period. Daniel Gustafsson indicated this should be targeted for when the tree opens after v19, meaning it would be a v20 improvement. This is appropriate for a non-critical optimization that doesn't affect correctness.
Assessment
This is a low-risk, mechanically straightforward modernization patch. The streaming read API has been battle-tested across multiple code paths in recent releases, and applying it to online checksum enablement is a natural next step. The main review considerations would be:
- Ensuring the buffer access strategy is preserved correctly
- Verifying throttling/rate-limiting still functions as intended
- Confirming no regression in memory usage or buffer pool behavior