Plug-in Coverage Hole for pglz_decompress()
Core Problem
This thread addresses a test coverage gap in PostgreSQL's built-in LZ compression/decompression implementation (pg_lzcompress.c), specifically in the pglz_decompress() function. The issue was discovered during a post-commit review of commit 67d318e70402, which had introduced a test_pglz_decompress() testing function (likely as part of expanding regression test infrastructure for compression paths).
The Uncovered Code Path
Inside pglz_decompress(), there is a critical safety guard that validates back-reference offsets during decompression:
if (unlikely(off == 0 ||
off > (dp - (unsigned char *) dest)))
return -1;
In the pglz compression format, compressed data contains back-references that say "copy N bytes from offset O behind the current output position." This check guards against two forms of corrupt input:
off == 0: A zero offset is nonsensical — it would mean "copy from the current position," which is undefined in the compression scheme.off > (dp - (unsigned char *) dest): The offset exceeds the amount of output already written, meaning it references memory before the output buffer — a potential out-of-bounds read if not caught.
Both conditions cause pglz_decompress() to return -1, signaling corrupt/invalid compressed data. Despite this being a security-relevant validation (preventing buffer over-reads on malformed input), no existing test case exercised either branch, leaving it as a gap visible in the gcov coverage reports.
Proposed Solution
Michael Paquier's fix is minimal and precise: add two carefully crafted bytea literals to the existing regression tests that call test_pglz_decompress():
SELECT test_pglz_decompress('\x011001'::bytea, 1024, true);
SELECT test_pglz_decompress('\x010300'::bytea, 1024, true);
Decoding the Test Inputs
These are hand-crafted invalid pglz compressed byte sequences:
\x011001: This constructs a compressed stream where the back-reference offset is larger than the output written so far, triggering theoff > (dp - (unsigned char *) dest)branch.\x010300: This constructs a compressed stream where the back-reference offset is zero, triggering theoff == 0branch.
Both cases exercise the return -1 error path, which the test_pglz_decompress() wrapper (with the true parameter likely indicating "expect failure" or "allow error") translates into a testable outcome.
Scope and Backport Strategy
The fix targets all supported branches down to v14, consistent with PostgreSQL's policy that the test infrastructure change introduced by 67d318e70402 was itself backported. Michael notes the fix must wait until after the current minor release cycle is tagged, indicating awareness of the release management process and desire not to introduce even trivial test changes into an imminent release.
Architectural Significance
While this is a small patch, it touches on several important themes:
-
Defense-in-depth for compression:
pglz_decompress()can be called on data read from disk (TOAST compressed values, for instance). Corrupt on-disk data must never cause undefined behavior. The guard being tested prevents a potential out-of-bounds memory read. -
Coverage-driven testing discipline: PostgreSQL's use of gcov coverage reports (published at
coverage.postgresql.org) to systematically identify untested code paths represents a mature software engineering practice. This thread demonstrates the workflow: review coverage → identify gap → craft minimal reproducer → backport. -
Post-commit review value: The coverage hole was found during post-commit review of
67d318e70402, illustrating how the community's review process catches issues even after initial commit.
Technical Design Decisions
There are no design disagreements in this thread — the fix is straightforward and uncontroversial. The key decisions are:
- Hand-crafted binary inputs rather than fuzzing: The test cases use specific bytea literals that deterministically trigger each branch, rather than relying on randomized/fuzz testing. This is appropriate for regression tests that must be deterministic and fast.
- Backport to v14: Coverage improvements are backported to all supported branches, maintaining test parity across versions.
- Timing deference to release cycle: Rather than rushing the fix in before a minor release, Michael explicitly defers to after the release, showing appropriate caution even for test-only changes.