COPY TO BLACKHOLE / pg_dump -j -Fb
Technical Analysis
Core Problem
This thread, initiated by Jakub Wartak (EDB), proposes the concept of a "COPY TO BLACKHOLE" facility — essentially a mechanism to discard output from COPY TO operations rather than writing it to a file or sending it to a client. The context appears tied to parallel pg_dump (-j) with the custom/directory format (-Fb or -Fd), where benchmarking and testing scenarios benefit from eliminating I/O as a bottleneck.
Why This Matters Architecturally
The motivation likely stems from several interconnected concerns:
-
Benchmarking pg_dump performance: When profiling parallel dump operations, it's often necessary to isolate server-side data extraction costs from client-side I/O costs. A blackhole destination allows measuring the pure overhead of tuple deformation, COPY formatting, and network transfer without filesystem writes contaminating results.
-
Parallel dump scalability testing: With
pg_dump -j(parallel jobs), the limiting factor can shift between server CPU, network bandwidth, and client disk I/O. A blackhole sink helps identify which component is the true bottleneck. -
COPY protocol internals: PostgreSQL's COPY TO path involves tuple slot access, output function calls for type conversion, and protocol-level formatting (text/binary). Understanding where time is spent requires the ability to eliminate downstream costs.
Technical Context
The COPY TO execution path in PostgreSQL involves:
DoCopy()→DoCopyTo()insrc/backend/commands/copy.c(or the refactoredsrc/backend/commands/copytto.cin newer versions)- Output is sent via
CopySendEndOfRow()which routes to either a file descriptor or the frontend protocol - For pg_dump, data flows: server COPY → libpq protocol → pg_dump process → compress → write to file
A "blackhole" could be implemented at several layers:
- Server-side: A special COPY TO destination that discards after formatting (still exercises type output functions)
- Client-side: pg_dump discards received data instead of writing (exercises full protocol path)
- OS-level: Writing to
/dev/null(exercises full path including syscalls)
Current State
With only the initial message in the thread, this appears to be a proposal/RFC at an early stage. The thread title suggests the proposal may combine:
- A server-side
COPY TO BLACKHOLEsyntax extension - Integration with pg_dump's parallel job infrastructure (
-j) and the custom format (-Fbwhich likely refers to a binary/blackhole format variant)
Design Considerations
Key tradeoffs that would need to be addressed:
- Security implications: A COPY TO BLACKHOLE still requires read permissions on the source table and exercises all row-level security policies
- WAL implications: None, since this is read-only
- Statistics/monitoring: Should
pg_stat_progress_copystill track rows processed? - Use cases beyond benchmarking: Could serve as a validation mechanism (verify COPY doesn't error without needing storage)
Relationship to Existing Infrastructure
PostgreSQL already has precedents for "null" destinations:
EXPLAIN (ANALYZE)executes queries but discards results/dev/nullas a COPY TO FILE destination on Unix systems (requires superuser or appropriate privileges)- The
pg_dump --no-dataflag skips data entirely but doesn't exercise the data path
The proposal likely seeks a more principled, cross-platform, and permission-friendly approach compared to COPY TO '/dev/null'.