Core Problem
pg_dump's text format composes trivially with Unix pipelines (pg_dump ... | lz4 | pv | ssh ...), but the directory format (-Fd)—the only format that supports parallel dump/restore (-j N)—requires a filesystem directory as the output destination. This creates a capability gap:
- Text/custom format + pipes: flexible post-processing (custom compression, throttling, SSH streaming, cloud upload) but single-threaded.
- Directory format: parallel, but output is locked to local files; the only post-processing hook is the built-in
--compressflag (gzip/lz4/zstd).
For very large databases this forces operators into an unhappy choice: spend hours on a single-threaded piped dump, or spend 2× the disk to stage a parallel directory dump before shipping/compressing it. Andrew Jackson highlights the practical pain: you cannot stream a parallel pg_dump directly into a parallel pg_restore today.
Proposed Solution
Introduce --pipe-command (later renamed to --pipe) to pg_dump and pg_restore. Rather than opening files with fopen/fclose, the archiver opens a subprocess with popen/pclose whose command template contains a %f placeholder expanded per-file (matching the directory-format filenames: toc.dat, <dumpid>.dat, blob_NNN.toc, blob_<oid>.dat).
Example round-trip with FIFOs enabling streaming parallel dump→restore:
pg_dump -j4 -Fd src --pipe-command="mkfifo dumpdir/%f; cat >> dumpdir/%f"
pg_restore -j4 -Fd --dbname=dst ./dumpdir
Implementation shape
- A new boolean
fSpecIsPipeon_archiveHandle(and analogous locals inpg_dump.c/pg_restore.c) flags the output/input target as a program rather than a path. - The existing filename-carrying fields are overloaded to carry the command template when the flag is set.
- A helper (
replace_percent_placeholders) expands%fagainst the per-entry filename that the directory format would otherwise have used. - All
fopensites in the directory archiver are routed through a conditional that pickspopenwhen the pipe flag is set. Close paths likewise route topclose. - Mutually exclusive with
--file, and (per v4+) incompatible with the builtin--compress(since compression is delegated to the user's pipeline).
Key Design Tensions
Append-mode for LO TOC
The existing code opens blob_NNN.toc in append mode. popen has no append semantics—a child process is spawned once, writes, and exits. Nitin's v4+ response is to change the LO TOC open mode to PG_BINARY_W unconditionally, even in the non-pipe case, so the two code paths converge. This is a subtle behavioral change to the existing directory format and deserves scrutiny: any caller relying on append semantics (e.g., resumable dumps, though pg_dump doesn't really support that) would be affected. Nitin flags it explicitly: "If there is a concern, we can revert to the older version."
Shell injection surface
popen invokes /bin/sh -c, so the %f substitution is the critical point. Directory-format filenames are generated internally (not user data), but the command string itself is user-supplied and executed via the shell—this matches COPY ... PROGRAM semantics, which already sets the precedent for "superuser/operator trusts themselves." v7 adds "shell escaping in the command before setting it as the file path," addressing paths with spaces/quotes in %f expansion.
Flag naming
Hannu anchors the naming in the COPY grammar: COPY ... TO { 'filename' | PROGRAM 'command' | STDOUT }. Candidates floated: --pipe-command, --to-pipe/--from-pipe, --to-program/--from-program, --pipe-command-pattern. Mahendra pushes for the terse --pipe; Hannu concedes. v7 settles on --pipe. The --to-program/--from-program option would have been the most consistent with COPY but was rejected implicitly.
Why no - / stdout convention?
Thomas Munro notes the obvious: the POSIX convention of - meaning stdout cannot work here because the directory format produces multiple files concurrently (especially under -j), so a template with a placeholder is unavoidable.
The defunct-shell test failure
A significant chunk of the thread (April–June 2025, then periodic revisits through early 2026) is consumed by a TAP test problem: commands like --pipe-command="cat > $tempdir/%f" work manually but leave a <defunct> sh child inside 002_pg_dump.pl, with cat reporting "No such file or directory" on a path that demonstrably exists. The symptom pattern—works in shell, fails under Perl's IPC::Run—strongly suggests quoting/argv-vs-shell-string confusion in how the test harness passes the argument, possibly compounded by the embedded > being interpreted by an outer shell layer rather than the inner popen shell. Nitin's eventual v7 fix avoids the problem by using perlbin (making the test portable to Windows at the same time) instead of relying on cat + shell redirection, which sidesteps rather than diagnoses the quoting issue.
pg_dumpall interaction
With v19's directory-mode support in pg_dumpall, Mahendra asks for --pipe there too. Nitin defers this to a follow-up patch and explicitly skips it for global restore, keeping the initial scope tractable.
Architectural Significance
The patch is small but opens a meaningful extension point: the directory archiver becomes a pluggable transport. Once popen is an accepted sink/source, logical consequences include:
- Cloud-native dumps without staging to local disk (pipe to
aws s3 cp - s3://.../%f). - Custom compression (zstd with tuned levels, xz, application-specific codecs) beyond the three built-ins.
- Streaming dump→restore without a materialized intermediate, using FIFOs as Andrew Jackson demonstrated—this is arguably the most valuable use case because it closes a long-standing gap: parallel logical replication-free migration without 2× disk overhead.
- Bandwidth/IO shaping via
pv,ionice-wrapped commands, etc.
The design is deliberately minimal: no new transport abstraction, no URI scheme, no plugin API. Just popen with a filename template. This is philosophically in line with PostgreSQL's existing COPY ... PROGRAM and archive_command—leveraging the shell as the extension mechanism.
Review Status
By v7 (May 2026) the patch has had one substantive external reviewer (Mahendra) and light committer-level attention (Thomas Munro weighed in only on the - question; Dilip Kumar asked for a rebase and TODO cleanup but hasn't posted a code-level review). The patch has rebased repeatedly against HEAD churn. Remaining open items per the latest message: squashing the first three commits, potential revert of the LO TOC open-mode change, and pg_dumpall integration as a follow-up.