Incremental Update: Filip's Clarification on Benchmark Methodology
Filip Janus responds to Tomas Vondra's follow-up questions, providing important clarifications but no new patch or fundamentally new technical content.
Key Clarifications
-
COMPRESS_BLCKSZ definition confirmed: The 32KB compression block size is a
#define COMPRESS_BLCKSZ (4 * BLCKSZ)inbuffile.c, introduced in the latest patch revision. It is a compile-time constant, not a GUC. -
Benchmark methodology mismatch acknowledged: Filip explicitly admits the comparison between his results and Tomas's was "not entirely apples-to-apples" — Tomas benchmarked the January patch (8KB blocks), while Filip's main results used the updated patch with 32KB blocks. He quantifies the block-size contribution: for the lz4 d=1000 w=8 HDD case, 8KB gave 58% while 32KB gave 52%, so the block size accounts for ~6 percentage points, with the larger difference attributable to storage/memory pressure differences between machines.
-
Per-session activation endorsed: Filip agrees with Tomas's concern about global enablement. He confirms
temp_file_compressionsupportsSETat session level, suggesting applications could enable it only for known-heavy queries. He acknowledges the worst case on fast NVMe (up to ~135% for lz4 with w=1) makes per-session/per-query activation more appropriate than global setting for systems with ample RAM and fast storage. -
Data compressibility coverage: Filip notes the
dparameter in the benchmark already covers a range of redundancy (d=1 least compressible, d=1000 most), and expects real-world workloads with wider rows, more NULLs, or variable-length fields would often compress better than the benchmark's compact schema (bigint + md5 text).
Assessment
This message is primarily a clarification/housekeeping response rather than a technical advancement. No new patch version, no new benchmark results, no new architectural decisions. The thread appears to be in a waiting state for further review or testing suggestions from Tomas.