Having problems generating a code coverage report

First seen: 2024-10-30 21:21:10+00:00 · Messages: 26 · Participants: 8

Latest Update

2026-05-06 · opus 4.7

Overview

This thread is not about a PostgreSQL code defect per se — it documents a long-running, collective struggle to keep PostgreSQL's code coverage reporting pipeline (make coverage-html / ninja coverage-html) usable against an increasingly strict and unstable lcov/genhtml toolchain (LCOV 2.x). What begins as "one user can't generate a report" evolves into a multi-distro, multi-year realization that the coverage infrastructure is de-facto broken on modern Linux distributions, including the machine that powers the official coverage.postgresql.org.

The discussion is technically shallow on PostgreSQL internals but operationally important: coverage reports are a primary tool reviewers and committers use to evaluate whether new tests exercise new code paths, and they are the backing data for the public coverage site.

The Core Problem

LCOV transitioned from the fairly permissive 1.x line to the much stricter 2.x line. LCOV 2.0 introduced (and subsequent 2.x releases tightened) a catalog of "consistency" and "sanity" checks that now cause hard errors — not warnings — where 1.16 would silently cope. Several distinct failure modes are observed across the thread:

  1. Negative hit counts (Unexpected negative count '-3' ... snprintf.c:532). Caused by GCC's non-atomic profile counter updates when shared libraries accumulate counts across multiple processes/threads. The compiler hint is -fprofile-update=atomic. src/port/snprintf.c recurs here because it is linked into many shared libraries (libpq, ecpg, backend via snprintf_shlib), so its .gcda is updated by concurrent processes during the test suite, producing race-condition counter corruption.

  2. duplicate merge record src/include/catalog. genhtml 2.x refuses to merge two coverage records that map to the same logical directory. PostgreSQL builds produce .gcno/.gcda files under both the source tree (src/include/catalog) and the build tree (which also re-creates src/include/catalog/ for generated catalog headers like pg_proc_d.h). Under a non-VPATH build, the two paths collide.

  3. duplicate file ./src/fe_utils/astreamer_gzip.gcno in both . and .. Directly caused by src/Makefile.global.in invoking lcov -d . -d $(srcdir). In a non-VPATH build . and $(srcdir) resolve to the same directory, and LCOV 2.x now diagnoses this rather than deduplicating.

  4. no data for line:864 ... psqlscanslash.l. Flex-generated .c files contain #line directives pointing back at the .l source. LCOV 2.x checks that every line in the source actually maps to a generated line; when it can't find a mapping (because flex doesn't emit #line for every line), it errors out with unmapped.

  5. unexecuted block on non-branch line with non-zero hit count. Output of GCC's branch probability data conflicting with LCOV's internal consistency model; classified as inconsistent.

  6. Deprecated RC keys (lcov_branch_coverage vs branch_coverage) — cosmetic but surfaces as warnings that can be promoted to errors.

The unifying theme is that LCOV 2.x has become extremely opinionated about GCC's gcov output, and GCC's output has itself become looser (atomic-update races, block-vs-branch accounting changes). PostgreSQL's Makefile integration (src/Makefile.global.in, commit c3d9a66024a9 is referenced as adding the redundant -d $(srcdir)) and its Meson integration (meson/scripts/coverage.py shell-out) both predate these stricter rules.

The Solution Space Explored

No clean fix emerges. What emerges instead is a progression of increasingly baroque workarounds:

1. Downgrade LCOV (Aleksander, Peter Geoghegan)

Build lcov 1.16 from the linux-test-project/lcov git tag into a user prefix and point the build at it. Under autoconf this is straightforward: LCOV, GENHTML, GCOV, CC are respected as environment variables. Under Meson this is much harder, because Meson's coverage.py discovers tools via its own probing. This is one of the few places the autoconf/Meson asymmetry is called out explicitly as a usability regression for developers.

2. -fprofile-update=atomic (documented hint)

Cures the negative-counter class of errors, but exposes the next layer (duplicate merge records, unmapped lines).

3. Escalating --ignore-errors incantations (Tom Lane, Peter Geoghegan, Álvaro, Michael)

The canonical trick discovered by Tom Lane is that specifying an error class twice demotes warnings to silence (--ignore-errors unmapped,unmapped). Over several months this list grows:

Álvaro's observation that these variables must be passed as make arguments, not environment variables, reflects how the PostgreSQL Makefile uses ?= vs = semantics: GENHTML_FLAGS is set inside src/Makefile.global.in and overrides a shell-exported value.

4. Build directory placement (Andres Freund)

Andres's key empirical insight: the duplicate-record family of errors only occurs when the build directory is inside the source directory. Moving the build directory outside (e.g., ../build/) sidesteps the -d . -d $(srcdir) collision under Meson. This effectively rediscovers that VPATH/out-of-tree builds are the only mode LCOV 2.x tolerates. For Meson this is fine (Meson always has a separate build dir, but if placed inside the source tree it still trips the check via relative-path resolution). For autoconf in-tree builds the collision is inherent.

5. .lcovrc centralization (Andres Freund)

Andres proposes a project-level or user-level .lcovrc:

ignore_errors=inconsistent,gcov,range
check_data_consistency=0
stop_on_error=0
genhtml_hierarchical=1
genhtml_show_navigation=1
parallel=16
geninfo_gcov_tool=/usr/bin/gcov-15

This is cleaner than command-line flags and also enables genhtml_hierarchical=1, which produces better navigation for PostgreSQL's deep directory structure.

6. Patching src/Makefile.global.in (Michael Paquier)

Michael proposes reverting (effectively) commit c3d9a66024a9 to remove the redundant -d $(srcdir) argument from the lcov invocation. This fixes non-VPATH autoconf builds but breaks VPATH builds, which legitimately need both directories because .gcno (compile-time notes) and .gcda (runtime data) live in the build tree while some source-embedded .gcno lives near the source in generated-file cases. No committer consensus forms around the patch.

Key Technical Insights

Why src/port/snprintf.c is the canary

It appears in nearly every failure report because it's the single file most heavily multiplexed across shared libraries (libpq, ecpg_compat, libpgport_shlib, server). Each shared library gets its own .gcda file (e.g., snprintf_shlib.gcda), all of which record hits into counters that are not atomic across processes by default. The negative-count artifact is a well-known gcov race and the reason -fprofile-update=atomic exists.

Why generated files (.l, .y, catalog *_d.h) break genhtml

LCOV 2.x aggressively enforces that source files referenced in .gcno actually exist on disk at the referenced path. Flex/bison generated C files carry #line directives back to the .l/.y, but lcov's line mapping sometimes can't round-trip (e.g., prologue lines in psqlscanslash.l). The unmapped error class exists specifically for this. Generated catalog headers living in both src/include/catalog/ (source, committed stubs) and build/src/include/catalog/ (generated headers) produce the duplicate merge record — LCOV sees two "canonical" locations for the same logical file.

The autoconf vs Meson asymmetry

Autoconf's src/Makefile.global.in exposes LCOV, GENHTML, GCOV as overridable variables, plus LCOVFLAGS/GENHTML_FLAGS. Meson's coverage-html target shells out to mesonbuild/scripts/coverage.py, which hardcodes argument lists and discovers lcov via shutil.which. Meson users have fewer escape hatches — a .lcovrc is nearly their only option because they can't easily inject --ignore-errors into the captured command line.

Operational impact on coverage.postgresql.org

Stefan Kaltenbrunner's report — that the production coverage site broke after upgrading to Debian Trixie — is the most consequential moment in the thread. Álvaro, who runs the site, ultimately gets it limping again with a custom flag cocktail, but notes CSS regressions (caching artifact) and that branch coverage no longer appears despite lcov_branch_coverage = 0 being removed from .lcovrc. The community has no tested, committed solution; the public site runs on ad-hoc shell magic.

No upstream PostgreSQL fix lands

Over ~18 months of the thread, no commit fixes the underlying integration. Michael's patch is posted but never committed because it only helps one of three build modes. The conclusion is effectively resignation: "A coverage report that just works happens to be good enough for me these days" (Michael, May 2026).

Participant Dynamics

The stance alignment is unusual: no disagreement on what is wrong, only on what workaround is least ugly. No committer defends the status quo.

What Would a Real Fix Look Like?

Reading between the lines, a proper fix would:

  1. Rework src/Makefile.global.in to pass -d exactly once in the non-VPATH case and twice (distinct) in VPATH, and to pass --prefix correctly so genhtml consolidates src/include/catalog regardless of whether it came from source or build tree.
  2. Add -fprofile-update=atomic to CFLAGS automatically when -Db_coverage=true / --enable-coverage is set.
  3. Ship a project-level .lcovrc (or inject equivalent flags) codifying the known-unavoidable --ignore-errors set for LCOV 2.x.
  4. Parallel changes in meson/scripts/coverage.py — likely requiring an upstream Meson PR, which explains the reluctance: the issue tracker links (mesonbuild/meson#11995, #12345) show this has been partly addressed upstream but not completely.

None of this work happens in the thread. The thread ends as a living FAQ of workarounds rather than a resolution.