COPY: validate option presence rather than option values

First seen: 2026-05-06 04:25:32+00:00 · Messages: 1 · Participants: 1

Latest Update

2026-05-06 · opus 4.7

Core Problem

This thread reports a minor but meaningful correctness issue in the COPY command's option validation logic. The documented contract of several COPY options is that they are only allowed under specific conditions (e.g., FORCE_ARRAY only with FORMAT json, ESCAPE only in CSV mode). However, the actual implementation conflates two different questions:

  1. Was the option specified at all? (presence)
  2. What value did the option evaluate to? (semantics)

The current code in ProcessCopyOptions / BeginCopyTo checks the effective value of boolean flags like opts_out->force_array rather than whether the user wrote FORCE_ARRAY in the option list. The consequence: COPY t1 TO stdout (FORMAT csv, FORCE_ARRAY false) silently succeeds because the resulting flag is false, even though specifying FORCE_ARRAY at all is documented as disallowed outside JSON format.

Why this matters architecturally

COPY's option-parsing subsystem is a long-standing source of friction. Options accrete over releases (FORMAT json and FORCE_ARRAY arrived in PG16, ON_ERROR/LOG_VERBOSITY/REJECT_LIMIT in PG17/18), and each option carries per-format compatibility rules. The validation pattern has historically been: parse all options into a CopyFormatOptions struct first, then run a series of cross-option consistency checks against the parsed values. That pattern has two failure modes demonstrated here:

Proposed Solution

The reporter's patch changes the validation strategy from value-based to presence-based for options whose very appearance is conditional. Concretely this requires tracking which options were syntactically present in the COPY statement, separately from the parsed values they produced. In practice this is done either by:

The first approach is more idiomatic in the PostgreSQL code base (compare log_verbosity_specified-style patterns elsewhere) and keeps the option parsing single-pass.

The reporter also argues for reordering validation: presence-based compatibility checks (e.g., "ESCAPE given but format is not CSV") should fire before per-option value parsing is attempted. This changes ESCAPE's error from "escape requires a parameter" to "COPY ESCAPE requires CSV mode" for the JSON case, which is more actionable.

Options affected

Based on the reporter's description of doing "a thorough pass," the likely affected options are those with format-conditional rules:

Key Technical Insights

1. The DefElem-list vs. parsed-struct tension

PostgreSQL's options-processing convention parses a List * of DefElem nodes into a typed options struct as a first step, then operates on the struct. This is clean and encourages uniform handling, but it erases the distinction between "user omitted the option" and "user specified the option with its default value." For options where that distinction is semantically meaningful (like COPY's conditionally-allowed flags, or ALTER SYSTEM's RESET vs. SET ... DEFAULT), the struct representation is lossy. The patch essentially recovers that information by adding presence flags.

2. Error-ordering is part of the API contract

The ESCAPE-in-JSON example illustrates that the sequence in which errors are reported is a user-facing contract, not an implementation detail. Reporting a syntactic error before a semantic one is defensible only when fixing the syntactic error could plausibly make the statement valid. When it cannot (ESCAPE will never be accepted in JSON mode regardless of its argument), reporting the semantic error first is strictly better UX.

3. Backwards-compatibility risk

Tightening validation is technically a behavior change: queries like (FORMAT csv, FORCE_ARRAY false) that succeed today will start erroring. This is almost certainly the right change — such queries are nonsensical and the documentation has always said so — but it means the fix is a master-only change, not backpatchable, and release notes should flag it. Any reviewer response will likely probe this: does any tool (pg_dump, ETL software, ORM) emit redundant COPY options that would break?

4. Scope discipline

The reporter correctly framed this as a narrow, mechanical cleanup rather than a redesign of COPY options. The patch surface is contained to copy.c / copyto.c / copyfrom.c option validation; there are no catalog, WAL, or planner changes. This is the right scope for a first-time contribution to land.

Participant Dynamics

At the point captured, this is a single initial post from Evan Li — no committer has yet weighed in. The typical reviewer pool for COPY option changes includes Andres Freund, Michael Paquier, Tom Lane, Masahiko Sawada, and Sutou Kouhei (who has been leading the COPY FORMAT extensibility work). Masahiko Sawada authored much of the ON_ERROR/REJECT_LIMIT infrastructure and would be a natural reviewer for consistency with that code. Tom Lane tends to weigh in on error-message wording and ordering questions.

Likely Review Trajectory

Expected review points:

  1. Should this be backpatched? Likely no, on the "behavior change" principle, even though the accepted-but-wrong queries are bugs.
  2. Is a _specified flag per option the right representation, or should there be a more general mechanism (e.g., a bitmap of specified options)?
  3. Test coverage — the patch should extend src/test/regress/sql/copy2.sql with explicit negative tests for each option/format combination, both with the option set to its no-op value and with valid values.
  4. Documentation — no doc changes are strictly required since the docs already say these options are restricted; the code is catching up to the docs.