pg_rewind: Skip Vanished Source Files During Traversal
Core Problem
This thread identifies a potential bug in pg_rewind's source directory traversal logic within recurse_dir(). The issue concerns a race condition handling path where lstat() returns ENOENT (file not found), but execution continues past that error-handling branch and subsequently inspects fst.st_mode — a field that was never populated because lstat() failed.
Technical Context
pg_rewind is PostgreSQL's tool for resynchronizing a former primary's data directory with a new primary after failover, allowing the old primary to rejoin as a standby. During operation, it traverses the source server's data directory to build a file map of what needs to be copied.
When running against a live source (i.e., the new primary is actively running), files can disappear between readdir() returning the directory entry and lstat() being called on it. This is a classic TOCTOU (time-of-check-to-time-of-use) race condition. The existing code correctly identifies this scenario and has a comment acknowledging that ENOENT from lstat() is acceptable when the source is a running server.
The Bug
The problem is in the control flow after the ENOENT check:
readdir()returns an entry namelstat()is called on that entry- If
lstat()fails withENOENT, the code logs/acknowledges this is OK - But execution falls through to code that inspects
fst.st_mode
Since lstat() failed, the struct stat fst buffer contains uninitialized or stale data. The subsequent S_ISREG(fst.st_mode), S_ISDIR(fst.st_mode), and S_ISLNK(fst.st_mode) checks operate on garbage values. This could lead to:
- Invoking the callback with incorrect file type information
- Attempting to recurse into a "directory" that doesn't exist
- Undefined behavior from reading uninitialized memory (depending on stack state)
In practice, this bug may rarely manifest because:
- The race window is small
- On many platforms, the uninitialized mode bits may not match any of the
S_IS*macros, causing the entry to be silently skipped anyway - The callback might fail gracefully when trying to access the vanished file
However, it's still a correctness bug — the code's intent (as expressed by the comment) is clearly to skip vanished entries, but the implementation doesn't actually do so.
Proposed Solution
The patch adds a continue statement in the ENOENT branch, causing the loop to immediately move to the next directory entry rather than falling through to the fst.st_mode inspection code. This is a minimal, surgical fix that aligns the implementation with the documented intent.
if (errno == ENOENT)
{
/* File doesn't exist anymore. This is ok, if the new primary
* is running and the file was just removed. */
continue; /* <-- proposed addition */
}
Risk Assessment
This is an extremely low-risk change:
- It only affects the error path (ENOENT from lstat)
- The
continuemakes the code do what the comment already says it should do - No behavioral change for the normal (non-error) path
- The testing approach (lstat shim returning ENOENT) demonstrates the behavioral difference clearly
Architectural Significance
While this is a small bug fix, it touches on an important aspect of pg_rewind's design: operating against live targets. pg_rewind was originally designed to work against stopped servers, and the live-server support (via libpq connection) introduced these race conditions. The file traversal code must be robust against files appearing and disappearing during traversal. This fix strengthens that robustness.
The pattern of "acknowledge error but fail to short-circuit" is a common source of bugs in C code with complex control flow, making this a useful example for code review practices.