Technical Analysis: Make Stack Depth Check Work with ASAN's use-after-return
Core Problem
PostgreSQL implements a stack depth check (check_stack_depth()) to prevent unbounded recursion from crashing the backend with a segfault. This check works by measuring the distance between the current stack position and a reference point (the stack base), comparing it against max_stack_depth.
The fundamental issue is how the current stack position is measured. Historically, PostgreSQL used the address of a local variable as a proxy for "where the stack pointer currently is." This works under normal compilation because local variables live on the stack, and their addresses monotonically decrease (on downward-growing stacks) as call depth increases.
Why ASAN Breaks This
AddressSanitizer's detect_stack_use_after_return feature (enabled by default) moves local variables from the actual stack frame to heap-allocated "fake frames." This allows ASAN to detect dangling references to stack variables after a function returns. However, this completely breaks PostgreSQL's stack depth measurement:
- A local variable's address now points to the heap, not the stack
- The computed "distance" between the stack base and the local variable is meaningless
- The stack depth check spuriously triggers, causing random test failures
The workaround in CI was setting ASAN_OPTIONS=detect_stack_use_after_return=0, which disables an important sanitizer feature—reducing ASAN's ability to catch use-after-return bugs in PostgreSQL's own code.
Proposed Solution
Replace the use of a local variable's address with __builtin_frame_address(0) in the stack depth check. This GCC/Clang builtin returns the address of the current function's stack frame (specifically, the frame pointer or equivalent), which:
- Always reflects the true stack position, regardless of ASAN transformations
- Is already used by PostgreSQL for measuring the stack base address (in
set_stack_base()) - Is portable across all compilers PostgreSQL supports (GCC, Clang, and compatible compilers)
The fix is elegant because it resolves an existing asymmetry: PostgreSQL already trusted __builtin_frame_address() for the stack base but used &local_var for the stack top measurement. Making both sides consistent is architecturally cleaner regardless of the ASAN motivation.
Key Technical Details
Stack Depth Check Mechanism
// Before (simplified):
void check_stack_depth(void) {
char stack_top_loc;
long stack_depth = (long)stack_base_ptr - (long)&stack_top_loc;
// ...
}
// After (simplified):
void check_stack_depth(void) {
long stack_depth = (long)stack_base_ptr - (long)__builtin_frame_address(0);
// ...
}
Why __builtin_frame_address(0) Is Safe
- Argument
0means "current frame" — this is well-defined and doesn't require actually walking the stack - Higher arguments (1, 2, ...) can be unreliable, but
0is guaranteed correct - It's already used in PostgreSQL's
set_stack_base(), so no new compiler dependency is introduced
Backpatching Considerations
The fix was backpatched to older branches. A noted complication is that older branches used long for the stack distance computation rather than the more appropriate pointer-sized types, but this was left unchanged to minimize backpatch risk.
Broader Impact
With this fix, PostgreSQL CI can remove detect_stack_use_after_return=0 from ASAN_OPTIONS, enabling ASAN to detect an entire class of bugs (use-after-return) that was previously invisible in CI testing. This strengthens PostgreSQL's automated testing without code behavior changes.
Design Tradeoffs
| Aspect | Decision | Rationale |
|---|---|---|
Use __builtin_frame_address(0) vs local var address |
Use builtin | Correct under ASAN, already used elsewhere, architecturally cleaner |
| Backpatch | Yes | Important for CI on stable branches; low risk since builtin is already used |
Fix older branch long type |
No | Minimize backpatch diff; existing code worked despite type concern |
Residual Issues Noted
Andres mentions that even with this fix, Clang still uses significantly more stack than GCC under ASAN, but considers that a separate issue (likely due to Clang's ASAN implementation inserting more instrumentation code per frame).