Proposal: Adding compression of temporary files

First seen: 2024-11-14 22:13:16+00:00 · Messages: 35 · Participants: 8

Latest Update

2026-05-27 · claude-opus-4-6

Incremental Update: Filip's Clarification on Benchmark Methodology

Filip Janus responds to Tomas Vondra's follow-up questions, providing important clarifications but no new patch or fundamentally new technical content.

Key Clarifications

  1. COMPRESS_BLCKSZ definition confirmed: The 32KB compression block size is a #define COMPRESS_BLCKSZ (4 * BLCKSZ) in buffile.c, introduced in the latest patch revision. It is a compile-time constant, not a GUC.

  2. Benchmark methodology mismatch acknowledged: Filip explicitly admits the comparison between his results and Tomas's was "not entirely apples-to-apples" — Tomas benchmarked the January patch (8KB blocks), while Filip's main results used the updated patch with 32KB blocks. He quantifies the block-size contribution: for the lz4 d=1000 w=8 HDD case, 8KB gave 58% while 32KB gave 52%, so the block size accounts for ~6 percentage points, with the larger difference attributable to storage/memory pressure differences between machines.

  3. Per-session activation endorsed: Filip agrees with Tomas's concern about global enablement. He confirms temp_file_compression supports SET at session level, suggesting applications could enable it only for known-heavy queries. He acknowledges the worst case on fast NVMe (up to ~135% for lz4 with w=1) makes per-session/per-query activation more appropriate than global setting for systems with ample RAM and fast storage.

  4. Data compressibility coverage: Filip notes the d parameter in the benchmark already covers a range of redundancy (d=1 least compressible, d=1000 most), and expects real-world workloads with wider rows, more NULLs, or variable-length fields would often compress better than the benchmark's compact schema (bigint + md5 text).

Assessment

This message is primarily a clarification/housekeeping response rather than a technical advancement. No new patch version, no new benchmark results, no new architectural decisions. The thread appears to be in a waiting state for further review or testing suggestions from Tomas.