COPY ON_CONFLICT TABLE; save duplicated record to another table.

First seen: 2026-04-25 04:12:50+00:00 · Messages: 3 · Participants: 3

Latest Update

2026-05-07 · opus 4.7

New Round: Zsolt Parragi's Crash/Correctness Report

Zsolt Parragi (Percona) contributes the first external reproduction testing of the v20 patch and surfaces four concrete bugs, two of which are outright server crashes. This moves the review from code-reading critique (Jim Jones) to runtime validation, and the findings are significantly more damaging than the stylistic/correctness issues flagged previously.

1. NULL-pointer crash on partitioned targets

A COPY into a range-partitioned root with a pre-existing conflicting row segfaults. This directly confirms the architectural question raised in the prior analysis ("Partitioned targets: what OID goes in copy_tbl?") — the patch doesn't just have an under-specified semantic for partitioning, it has no working code path at all. The likely cause is that the conflict-routing logic derefs resultRelInfo state that only the root-level ResultRelInfo carries, while tuple routing has swapped in a per-leaf ResultRelInfo that lacks the conflict-table wiring. Fixing this requires propagating the conflict-table ResultRelInfo through ExecFindPartition / ExecInitRoutingInfo, analogous to how ON CONFLICT DO NOTHING is threaded per-partition.

2. NULL-pointer crash under REPEATABLE READ

A single-row duplicate inside a REPEATABLE READ transaction crashes. This is consistent with the speculative-insertion machinery assuming READ COMMITTED semantics: under RR, the EvalPlanQual / HeapTupleSelfUpdated paths that ON CONFLICT relies on behave differently, and INSERT ... ON CONFLICT itself historically had to grow explicit handling for serializable/RR isolation. The COPY path apparently omits whatever ExecOnConflictUpdate does to cope with a visible-to-snapshot conflicting tuple under RR, and falls off the end with a NULL TupleTableSlot or similar.

3. Silent data loss with BEFORE ROW triggers on a no-index table

The most alarming report: COPY reports "3 copied" but SELECT count(*) returns 0 on a table with no indexes at all and a no-op BEFORE INSERT trigger. Since there are no unique constraints, no conflict path should ever engage — yet tuples vanish. This strongly suggests the patch's control flow incorrectly routes rows into the "skip main insert" branch whenever some condition tied to trigger presence or ExecOpenIndices(..., true) is met, independent of whether a conflict was actually detected. It may be directly caused by the unconditional speculative-index opening that Jim Jones flagged in the prior round — speculative insertion expects a finalization step (table_tuple_complete_speculative) that is apparently being skipped, leaving tuples in a never-confirmed state that gets pruned. If so, Jim's fix (gate ExecOpenIndices's second arg on on_conflict != ONCONFLICT_NONE) is not just a hygiene fix but load-bearing for data integrity.

4. COPY TO silently accepts the options

Zsolt independently rediscovers exactly the defect Jim Jones identified last round: COPY t TO ... (ON_CONFLICT TABLE, CONFLICT_TABLE ...) succeeds instead of raising. Two independent reviewers hitting the same bug confirms it's a first-impression failure that will continue to embarrass the patch until fixed.

Significance

History (1 prior analysis)
2026-05-06 · opus 4.7

COPY ON_CONFLICT TABLE: Redirecting Duplicate Rows During Bulk Load

Core Problem

PostgreSQL's COPY FROM is the canonical bulk-load path, but it has a long-standing all-or-nothing failure model with respect to constraint violations. When a unique or primary key conflict is encountered mid-load, the entire COPY aborts and the transaction rolls back. For ETL workloads ingesting semi-clean data from external systems (CSV dumps, upstream exports with dedup issues), this forces users into awkward workarounds: pre-staging into an unconstrained table, running INSERT ... ON CONFLICT manually, or using external tools like pg_bulkload.

The TODO referenced in the original post asks specifically for: "Detect duplicate rows and redirect them to a separate table without aborting the load." This is the bulk-load analog of the INSERT ... ON CONFLICT capability that has existed since 9.5, and it dovetails with the existing work (commitfest entry 4817) on ON_ERROR ignore/table for malformed input rows. The two features share infrastructure: both need a user-named sink table, type/shape validation of that table, and a per-row dispatch mechanism in the COPY main loop.

Proposed Design

Jian Universality's v20 patch introduces the syntax:

COPY target FROM ... (ON_CONFLICT TABLE, CONFLICT_TABLE conflict_tbl)

Key design choices:

  1. Reuse of the INSERT ON CONFLICT speculative-insertion machinery. Rather than inventing a new conflict-detection path, the patch drives duplicate detection through the existing ExecOnConflictUpdate/speculative-token infrastructure. This is the correct architectural choice — reimplementing conflict detection at the COPY layer would duplicate a surprisingly subtle piece of the executor that interacts with index AM speculative insertion (ExecInsertIndexTuples with specToken) and HOT chains.

  2. Fixed four-column schema for the conflict table, validated structurally:

    • OID — the target table's OID (copy_tbl)
    • TEXT — source filename (or STDIN)
    • INT8 — line number
    • TEXT — the raw input line

    Validation is done by pg_attribute.atttypid only, not by column name. This mirrors the convention used elsewhere in Postgres for structural log tables and avoids forcing users into specific naming. However, it trades some self-documentation for flexibility — a reviewer could reasonably argue that name-based validation would catch user mistakes earlier.

  3. No new composite type. The four-column contract is inlined rather than materialized as a pg_type entry. This is defensible given the schema is trivial and a catalog type would complicate pg_dump/upgrade paths.

  4. Storing the raw input line rather than the parsed tuple. This is important: if the input had formatting quirks that survived parsing but users want to re-process, the raw line is more useful than a reconstructed one. It also sidesteps having to represent a tuple that may have already been partially transformed (default expressions, domain coercions).

Technical Issues Raised in Review

Jim Jones's review (the first substantive feedback) identifies several correctness and hygiene issues that illuminate the design's rough edges:

Semantic issues

  • COPY TO silently accepts the options. The patch wires the new options into ProcessCopyOptions without gating on direction. CONFLICT_TABLE is meaningless for COPY TO and must raise an error. Jim's suggested fix — threading is_from into defGetCopyOnConflictChoice — is the standard pattern (cf. how FREEZE, HEADER, etc. are validated).

  • Unconditional ExecOpenIndices(resultRelInfo, true). The second argument enables speculative-insertion support on the index entries. Forcing it to true for all COPY FROM paths is a regression: it allocates speculative-insertion state and may affect index AM behavior even when no conflict handling is requested. The correct predicate is cstate->opts.on_conflict != ONCONFLICT_NONE. This is the most consequential bug Jim flags, because it silently changes behavior for every existing COPY user.

  • Redundant guard in CopyFrom(). A nested cstate->opts.on_conflict != ONCONFLICT_NONE check appears inside a branch where that condition is already established. Dead code, but symptomatic of the patch evolving in layers.

  • Duplicate error message text for opposite conditions (conflict_tbl_specified vs !conflict_tbl_specified). Users will see the same message regardless of which option they forgot.

Minor / mechanical

  • Function name typo: defGetdefGetCopyOnConflictChoice (doubled prefix).
  • Dead enum value COPY_ON_ERROR_TABLE — suggests scope creep or incomplete unification with the ON_ERROR patch (commitfest 4817).
  • Multiple spelling errors in comments (realtion, vertification, resouces, unqiue).
  • Unnecessary pnstrdup before CStringGetTextDatum: CStringGetTextDatum already copies into a palloc'd text, so the intermediate pnstrdup leaks and wastes a copy. cstring_to_text_with_len(cstate->line_buf.data, cstate->line_buf.len) is the idiomatic form when you have an explicit length (the line buffer is not guaranteed NUL-terminated-safe for arbitrary binary content, though for text-format COPY it effectively is).

Unresolved Architectural Questions

The author flags one himself: "I need to double-check the exclusion unique constraint." This is non-trivial. INSERT ON CONFLICT explicitly does not support exclusion constraints — only unique indexes and primary keys — because speculative insertion requires the index AM's amcanunique + speculative-token protocol, which GiST/exclusion constraints don't implement. If COPY's new conflict-handling path inherits this limitation, the documentation must state it clearly; if it attempts to handle exclusion constraints through a different mechanism, that's a significant design expansion.

Other questions not yet addressed in the thread:

  • Relationship to commitfest 4817 (ON_ERROR table). The two features should almost certainly share syntax and a common conflict/error-logging table contract. Pursuing them independently risks divergent grammars (ON_ERROR TABLE err_tbl vs ON_CONFLICT TABLE, CONFLICT_TABLE conflict_tbl) that will be painful to reconcile later.
  • Partitioned targets. The copy_tbl column stores the target OID, but if COPY is into a partitioned root, should this be the root OID or the leaf partition where the conflict occurred? The latter is more useful operationally.
  • Permission model. Must the invoking user have INSERT on conflict_tbl? The patch presumably uses the current user's privileges, but this deserves explicit design.
  • Performance. Routing conflicts into another heap per row defeats the multi-insert buffering that makes COPY fast. For heavy-conflict inputs, this could be dramatically slower than the current abort-and-retry pattern. Batching conflict rows would help.

Assessment

The patch is an early-stage proposal with a sensible high-level architecture (reuse ON CONFLICT infrastructure, fixed conflict-table shape) but meaningful correctness bugs (unconditional speculative index opening, missing COPY TO rejection) and polish issues. The right next step is almost certainly to coordinate with the ON_ERROR table work rather than ship as a parallel feature, since users will expect a unified "divert bad rows to a side table" story covering both parse errors and constraint conflicts.

No committer has yet weighed in, so the design has not been validated against the bar the feature will ultimately face — particularly around syntax (the ON_CONFLICT TABLE, CONFLICT_TABLE x two-option form is awkward; ON_CONFLICT (redirect_to => 'tbl') or simply ON_CONFLICT TABLE tbl would read better).