BM_IO_ERROR Flag Lost in TerminateBufferIO Due to UnlockBufHdrExt Order of Operations
Core Problem
This thread identifies a subtle but significant bug in PostgreSQL's buffer I/O completion logic introduced in the HEAD branch (PostgreSQL 19 development cycle) as part of the asynchronous I/O (AIO) infrastructure work.
The Bug Mechanics
The issue lies in the interaction between UnlockBufHdrExt() and TerminateBufferIO():
UnlockBufHdrExt manipulates buffer header state bits in this order:
buf_state |= set_bits; // First: set requested bits
buf_state &= ~unset_bits; // Second: clear requested bits
buf_state &= ~BM_LOCKED; // Third: release the lock
TerminateBufferIO unconditionally adds BM_IO_ERROR to the unset_flag_bits mask:
unset_flag_bits |= BM_IO_ERROR;
The critical problem: when a caller like AbortBufferIO() or buffer_readv_complete_one() wants to set BM_IO_ERROR (because an I/O error occurred), they pass it via set_bits. However, TerminateBufferIO always adds BM_IO_ERROR to unset_bits as well. Since the unset operation happens after the set operation in UnlockBufHdrExt, the bit is immediately cleared after being set. The net result is that BM_IO_ERROR can never be successfully set on a buffer.
Architectural Significance
BM_IO_ERROR is the mechanism by which PostgreSQL marks buffers whose I/O operations have failed. While the reporter notes that the system doesn't heavily rely on BM_IO_ERROR currently (which is why the bug went unnoticed in standard testing), this flag is architecturally important for:
- Error propagation — other backends waiting on a buffer's I/O completion need to know the I/O failed
- AIO infrastructure correctness — as async I/O matures, proper error state tracking becomes critical
- Extension ecosystem — the bug was discovered via proprietary code that asserted on
BM_IO_ERRORpresence afterPGAIO_RS_ERROR
This is fundamentally a bit-manipulation ordering bug — a classic category of concurrency/state-machine errors in buffer manager code.
Proposed Solution
Yura Sokolov proposes a multi-part fix:
1. Fix the TerminateBufferIO Logic
Make TerminateBufferIO exclude BM_IO_ERROR from unset_flag_bits when it is already present in set_flag_bits. This ensures the unconditional clearing doesn't conflict with an explicit set request.
2. Add Defensive Assert in UnlockBufHdrExt
Assert(!(set_bits & unset_bits));
This catches any future case where the same bit appears in both the set and unset masks — a logical contradiction that should never occur.
3. Test Coverage (v1-001)
Modification of 001_aio.pl test to verify BM_IO_ERROR is properly set after a read failure, exercising the AIO error path.
4. FlushBuffer Failure Test (v1-002)
A separate test using injection points to verify write failure behavior. The author notes uncertainty about whether 001_aio.pl is the right location since writes are not yet async.
5. Cosmetic Fix (v1-003)
Makes DebugPrintBufferRefcount output formatting cleaner.
Design Tradeoff Discussion
The fix presents a minor design question: should the assertion be hard (preventing any overlap between set/unset bits) or should the code gracefully handle the overlap? Sokolov's approach is both — add the assertion for development builds while making TerminateBufferIO smart enough to not create the contradiction in the first place. This is the defensive programming approach appropriate for buffer manager code.
Context: PostgreSQL 19 AIO Infrastructure
This bug was introduced as part of the new asynchronous I/O subsystem being developed for PostgreSQL 19. The unset_flag_bits parameter and the unconditional BM_IO_ERROR clearing in TerminateBufferIO are new code patterns. Michael Paquier correctly identifies this as a PostgreSQL 19 open item, meaning it needs resolution before release.
Status
As of the thread's end (June 2026), the bug fix has not been committed. Melanie Plageman flags it as one of the oldest open items, noting that while beta1 has been missed, the fix itself is small and should be pushed even if the test patches need more review time. Andres acknowledges losing track and commits to returning to it.