2026-05-18 · claude-opus-4-6
Technical Analysis: Improving Test Failure Output in PostgreSQL's Meson Build System
Core Problem
PostgreSQL's test infrastructure, particularly when using the Meson build system, produces deeply unhelpful output when tests fail. The fundamental issue is an information accessibility gap: when a test fails, the diagnostically useful information (regression diffs, command stderr/stdout, actual error messages) is buried in log files scattered across the build directory, requiring manual navigation to discover what went wrong.
This problem manifests in two contexts:
- CI environments (CirrusCI): Developers must click through a file browser UI to find
regression.diffs or log files — a tedious multi-step process for every failure.
- Local development: Meson's TAP output shows only pass/fail status without the reason for failure. Developers must hunt through
build/testrun/ directory trees for relevant log files.
The Makefile-based build system was slightly better here because it showed more inline information, but Meson strictly shows only the TAP protocol output, making the problem worse.
Architectural Context
The test infrastructure involves several layers:
- pg_regress: The C program that runs SQL regression tests, compares output via diff, and produces
regression.diffs
- TAP test framework: Perl-based testing using
Test::More, orchestrated through PostgreSQL::Test::Utils and PostgreSQL::Test::Cluster
- Meson test runner: Captures TAP output from test programs and displays results
- IPC::Run: Perl module used to spawn subprocesses for command execution in TAP tests
The key architectural constraint is that Meson can only display what the test programs emit via TAP protocol (stdout). Any diagnostic information must be encoded as TAP diag() output to be visible in the final test results.
Proposed Solutions (Patch Series)
Patch 0001: Include diffs in TAP output from pg_regress
Modifies pg_regress to emit the first N lines of regression.diffs as TAP diagnostic output when tests fail. This makes the actual test differences immediately visible without file navigation.
Technical details:
- Reads the combined diff file after test completion
- Emits lines using TAP diagnostic format (
# prefix)
- Originally limited to 80 lines; later revised based on feedback
- Required careful handling of line counting (lines > 1023 chars count as multiple lines due to
fgets buffer)
- On Windows, required close/reopen logic for the diffs file due to file locking semantics ("The process cannot access the file because it is being used by another process")
- Introduced
DIAG_DETAIL and DIAG_END markers (analogous to existing NOTE_DETAIL/NOTE_END)
Patch 0002: Improved command_ok/command_fails output
Replaces the run_log() wrapper with direct IPC::Run::run calls that capture stdout/stderr, then display them via diag() on failure.
Technical subtlety: Cannot use simple variable capture (\$stdout) for daemon-spawning commands like pg_ctl start because the child postmaster inherits the file descriptors and outlives pg_ctl, causing IPC::Run to hang waiting for EOF. The solution pipes to tempfiles instead, mimicking the existing command_like_safe pattern.
Behavioral change noted in review: The old run_log printed a "# Running: ..." line even on success. The new version only prints command details on failure, reducing successful test log verbosity.
Patch 0003: Replace die with croak in .pm files
Changes error reporting in the test infrastructure modules (Cluster.pm, Utils.pm) to use Carp::croak instead of die. This causes error messages to report the caller's location (the actual test file) rather than the infrastructure module's line number.
Key detail: The committer (Andrew Dunslane) added @CARP_NOT in Utils.pm so that croak() would look past Cluster.pm to the actual TAP script caller — solving the cross-module attribution problem.
Patch 0004: Use done_testing()
Adds done_testing() calls to avoid the unhelpful "Tests were run but no plan was declared and done_testing() was not seen" messages on failures. This was originally suggested by Andrew Dunslane in a 2022 thread.
Patch 0005: Convert pg_upgrade and stream_regress tests to use command_ok
Migrates 002_pg_upgrade.pl and 027_stream_regress.pl to use the improved command_ok function, gaining the better failure output.
Key Design Tradeoffs and Disagreements
Output Volume vs. Usefulness (Peter Eisentraut's Objection)
Peter Eisentraut raised a significant objection post-commit: 80 lines of diff output, with lines potentially wider than terminal width, can produce 200+ wrapped lines that push the test summary off-screen, making the overall output less useful than before.
His specific concerns:
- Terminal overflow: With typical terminal heights of 40-60 lines, 80 diff lines swamp the screen
- Line wrapping: Diff lines are often wider than terminal width, multiplying the effective line count
- Crash cascades: When a server crashes, the first diffs shown are often from other tests that failed due to the crash, not the crashing test itself — making the truncated output misleading
- Stream synchronization: The diff output appears in the middle of test runs due to stdout/stderr buffering issues, as visible in buildfarm output
Resolution Approach
Andrew Dunslane proposed gating the behavior on an environment variable, adding it to cirrus.tasks.yml for CI. Jelte Fennema-Nio objected that this would break the case where pg_regress is called from TAP tests (like 027_stream_regress.pl). His counter-proposal (v5 patch, May 2026) addresses the concerns differently while preserving the TAP-embedded diff functionality.
Buildfarm Redundancy
Peter noted that the buildfarm already provides good failure navigation (links to detailed output below the summary). The new feature is somewhat redundant there and may actually degrade the buildfarm's presentation by injecting diff content that disrupts timing information display (as Alexander Lakhin demonstrated with the drongo buildfarm animal).
Technical Insights
-
Windows file locking: The Windows kernel enforces exclusive file locks more aggressively than POSIX systems. Writing to regression.diffs while pg_regress still has it open requires close/reopen cycles.
-
Daemon process file descriptor inheritance: The pg_ctl start/restart case is uniquely problematic because the spawned postmaster inherits stdout/stderr FDs. This is a well-known Unix process management issue that makes simple pipe-to-variable capture dangerous (potential indefinite hang).
-
TAP protocol limitations: TAP diagnostic lines (# ...) are the only mechanism to inject information visible in Meson's output. This is both the solution and the constraint — everything must be encoded in this narrow channel.
-
Carp vs die semantics: die reports the error at the point of failure in library code; croak reports it at the caller's location. For test infrastructure, the caller location is almost always what developers need. The @CARP_NOT mechanism allows multi-level stack unwinding past intermediate modules.
-
The "first lines" heuristic: The assumption that the first N lines of diffs contain the most useful information is contested. It works for single-test failures but fails for crash scenarios where cascade failures dominate the beginning of the diff file.