Make stack depth check work with asan's use-after-return

First seen: 2026-05-27 13:23:13+00:00 · Messages: 3 · Participants: 2

Latest Update

2026-06-01 · claude-opus-4-6

Technical Analysis: Make Stack Depth Check Work with ASAN's use-after-return

Core Problem

PostgreSQL implements a stack depth check (check_stack_depth()) to prevent unbounded recursion from crashing the backend with a segfault. This check works by measuring the distance between the current stack position and a reference point (the stack base), comparing it against max_stack_depth.

The fundamental issue is how the current stack position is measured. Historically, PostgreSQL used the address of a local variable as a proxy for "where the stack pointer currently is." This works under normal compilation because local variables live on the stack, and their addresses monotonically decrease (on downward-growing stacks) as call depth increases.

Why ASAN Breaks This

AddressSanitizer's detect_stack_use_after_return feature (enabled by default) moves local variables from the actual stack frame to heap-allocated "fake frames." This allows ASAN to detect dangling references to stack variables after a function returns. However, this completely breaks PostgreSQL's stack depth measurement:

The workaround in CI was setting ASAN_OPTIONS=detect_stack_use_after_return=0, which disables an important sanitizer feature—reducing ASAN's ability to catch use-after-return bugs in PostgreSQL's own code.

Proposed Solution

Replace the use of a local variable's address with __builtin_frame_address(0) in the stack depth check. This GCC/Clang builtin returns the address of the current function's stack frame (specifically, the frame pointer or equivalent), which:

  1. Always reflects the true stack position, regardless of ASAN transformations
  2. Is already used by PostgreSQL for measuring the stack base address (in set_stack_base())
  3. Is portable across all compilers PostgreSQL supports (GCC, Clang, and compatible compilers)

The fix is elegant because it resolves an existing asymmetry: PostgreSQL already trusted __builtin_frame_address() for the stack base but used &local_var for the stack top measurement. Making both sides consistent is architecturally cleaner regardless of the ASAN motivation.

Key Technical Details

Stack Depth Check Mechanism

// Before (simplified):
void check_stack_depth(void) {
    char stack_top_loc;
    long stack_depth = (long)stack_base_ptr - (long)&stack_top_loc;
    // ...
}

// After (simplified):
void check_stack_depth(void) {
    long stack_depth = (long)stack_base_ptr - (long)__builtin_frame_address(0);
    // ...
}

Why __builtin_frame_address(0) Is Safe

Backpatching Considerations

The fix was backpatched to older branches. A noted complication is that older branches used long for the stack distance computation rather than the more appropriate pointer-sized types, but this was left unchanged to minimize backpatch risk.

Broader Impact

With this fix, PostgreSQL CI can remove detect_stack_use_after_return=0 from ASAN_OPTIONS, enabling ASAN to detect an entire class of bugs (use-after-return) that was previously invisible in CI testing. This strengthens PostgreSQL's automated testing without code behavior changes.

Design Tradeoffs

Aspect Decision Rationale
Use __builtin_frame_address(0) vs local var address Use builtin Correct under ASAN, already used elsewhere, architecturally cleaner
Backpatch Yes Important for CI on stable branches; low risk since builtin is already used
Fix older branch long type No Minimize backpatch diff; existing code worked despite type concern

Residual Issues Noted

Andres mentions that even with this fix, Clang still uses significantly more stack than GCC under ASAN, but considers that a separate issue (likely due to Clang's ASAN implementation inserting more instrumentation code per frame).