pgbench: make verbose error messages thread-safe

First seen: 2026-04-24 06:26:03+00:00 · Messages: 6 · Participants: 4

Latest Update

2026-05-14 · claude-opus-4-6

pgbench: Thread-Safety Bug in Verbose Error Messages

Core Problem

A thread-safety bug was discovered in pgbench's printVerboseErrorMessages() function. The function uses a function-local static PQExpBuffer to build verbose error output. In C, a static local variable persists across invocations and is shared by all threads — meaning when pgbench runs with multiple threads (-j > 1) and --verbose-errors is enabled, concurrent threads can corrupt each other's message buffer. This manifests as garbled or corrupted error output.

The root cause traces back to commit 4a39f87acd6e, which introduced the --verbose-errors feature in PostgreSQL v15. The use of a static buffer was likely a micro-optimization to avoid repeated heap allocation — the buffer is allocated once and then reused via resetPQExpBuffer() on subsequent calls. This pattern is perfectly safe in single-threaded code but is fundamentally broken in pgbench's multi-threaded architecture.

Why This Matters Architecturally

pgbench's threading model divides work into TState (per-thread state) and CState (per-connection state). Multiple CState objects are driven by a single TState thread. Any shared mutable state that isn't explicitly protected by synchronization is a data race. The static buffer here is an instance of implicit global mutable state — it doesn't appear in any struct but is effectively a global shared by all threads through the function's calling convention. This class of bug is particularly insidious because it only triggers under specific runtime conditions (multiple threads + errors occurring + verbose mode enabled).

Proposed Solutions

1. Local PQExpBufferData (Proposed Patch — Accepted)

Fujii's patch replaces the static PQExpBuffer (pointer to heap-allocated, long-lived buffer) with a stack-local PQExpBufferData (value type, allocated/freed each call). This is the simplest possible fix:

// Before (broken):
static PQExpBuffer buf = NULL;
if (buf == NULL)
    buf = createPQExpBuffer();
else
    resetPQExpBuffer(buf);

// After (fixed):
PQExpBufferData buf;
initPQExpBuffer(&buf);
// ... use buf ...
termPQExpBuffer(&buf);

Each thread gets its own stack-allocated buffer. The tradeoff is that every invocation now performs a malloc/free cycle (via initPQExpBuffer/termPQExpBuffer) rather than reusing a single allocation.

2. Per-CState Buffer (Michael Paquier's Suggestion)

Store the PQExpBuffer in each CState struct, making it per-connection. This avoids repeated allocation while maintaining thread safety since each connection's state is only accessed by its owning thread.

3. Per-TState Buffer (Evan Li's Refinement)

Store the buffer in TState rather than CState. This is more correct from a resource-efficiency standpoint: since the concern is thread-safety (not connection-safety), one buffer per thread suffices. Multiple connections within the same thread don't execute concurrently, so a single buffer per TState can be safely reused across all its CState instances.

Resolution and Rationale

The community converged on the simplest fix (option 1) based on a practical performance argument: the printVerboseErrorMessages() function is only invoked when:

  1. --verbose-errors is explicitly enabled, AND
  2. An actual error occurs

In both conditions, the dominant cost is I/O (writing the log message), not memory allocation. The overhead of one malloc/free per error message is negligible compared to the fprintf or equivalent that follows. Introducing a new field into TState or CState would add structural complexity for an optimization that provides no measurable benefit in the error-handling hot path.

This is a good example of choosing the right fix granularity — the simplest correct solution is preferred when the theoretical performance concern is dominated by other costs in the same code path.

Backport Implications

Since the bug was introduced in v15 with the --verbose-errors feature, the fix should be backpatched to v15 through the current release. The patch is minimal and low-risk, modifying only the lifetime of a single buffer variable within a single function.