Reject HEADER with binary and json COPY formats by option presence

First seen: 2026-05-31 01:57:06+00:00 · Messages: 1 · Participants: 1

Latest Update

2026-06-01 · claude-opus-4-6

Technical Analysis: Reject HEADER with binary and json COPY formats by option presence

Core Problem

The thread identifies an inconsistency between PostgreSQL's documentation and implementation regarding how the HEADER option interacts with binary and json COPY formats. The documentation states categorically that "This option is not allowed when using binary or json format," implying that the mere presence of the option should trigger an error. However, the actual implementation only rejects the option based on its value — specifically, HEADER '0' (a falsy value) is silently accepted while HEADER '1' (a truthy value) is correctly rejected.

Architectural Significance

This is fundamentally a question about option validation semantics in PostgreSQL's command processing layer. The issue touches on a broader design principle: should PostgreSQL validate options based on their presence (syntactic) or their semantic effect (value-based)?

The distinction matters because:

  1. Documentation fidelity: If the docs say an option "is not allowed," users expect any specification of it to fail, regardless of value.
  2. Consistency across subsystems: The thread demonstrates that VACUUM already follows presence-based validation (e.g., BUFFER_USAGE_LIMIT 0 with FULL is rejected even though 0 effectively disables the strategy). COPY should follow the same pattern.
  3. Future extensibility: Allowing falsy values to pass silently creates a precedent where other options might inconsistently accept "no-op" values in invalid contexts, making behavior harder to reason about.

Technical Details of the Bug

The validation code in the COPY path currently checks the effective value of the header option (likely checking if header > 0 or similar) rather than checking the header_specified boolean flag that already exists in the parsing infrastructure. The header_specified variable is set whenever the HEADER option appears in the command syntax, regardless of its value.

Current (incorrect) behavior:

-- Rejected (correct): header is truthy in binary mode
CREATE FOREIGN TABLE ft (...) OPTIONS (format 'binary', header '1');
-- ERROR: cannot specify HEADER in BINARY mode

-- Accepted (incorrect): header is falsy in binary mode
CREATE FOREIGN TABLE ft (...) OPTIONS (format 'binary', header '0');
-- CREATE FOREIGN TABLE (should error)

Expected behavior after fix:

Both should be rejected because the HEADER option is present, regardless of its value being effectively a no-op.

Proposed Solution

The fix is described as straightforward: change the validation check from inspecting the header option's value to checking the header_specified flag. This flag is already tracked during option parsing, making the implementation minimal and low-risk.

The relevant code path is in the COPY option validation (likely in src/backend/commands/copy.c or src/backend/commands/copyfrom.c and src/backend/commands/copyto.c), where after all options are parsed, incompatible combinations are checked. The condition guarding the "cannot specify HEADER in BINARY mode" error needs to be changed from something like:

if (opts->header_line && opts->binary)
    ereport(ERROR, ...);

to:

if (header_specified && opts->binary)
    ereport(ERROR, ...);

And similarly for the json format check.

Broader Context

The author references a prior thread ([1]) about a similar issue with the COPY command directly (as opposed to file_fdw), suggesting this pattern of value-based rather than presence-based validation may exist in multiple code paths. The issue was discovered while testing the "file_fdw: Support multi-line HEADER option" patch, indicating active development in this area that could compound the inconsistency if left unaddressed.

Design Tradeoff

There's a philosophical tension here:

The argument for presence-based rejection is stronger because: (a) it matches documentation, (b) it matches precedent in other commands, (c) it prevents tools/scripts from accidentally including meaningless options that might later cause confusion during debugging or migrations.