Parallel INSERT SELECT take 2 — May 2026 Monthly Summary
Overview
After five years of dormancy, this thread was revived by Tomas Vondra with a fundamental reassessment of the approach taken by the Fujitsu team (Hou, Bharath, Greg Nancarrow) to parallelize the SELECT side of INSERT ... SELECT. The month saw intense debate between two competing philosophies — user-declared parallel safety vs. auto-maintained catalog flags — and culminated in a potential convergence toward a simpler tuplestore-based materialization approach, though no new patch was posted.
Key Technical Debates
Declarative vs. Auto-Maintained Parallel Safety
The original patch series (v1–v10) introduced pg_class.relparalleldml as a user-declared flag (CREATE/ALTER TABLE ... PARALLEL DML {UNSAFE|RESTRICTED|SAFE}), analogous to proparallel on functions. This eliminates planning-time overhead for partitioned tables but shifts correctness responsibility to users.
Tomas's counter-proposal: Auto-maintain relparalleldml by cascading updates when triggers, defaults, constraints, or referenced functions change. He argues table safety is fully decidable from the catalog (unlike function safety), making user-declaration philosophically inconsistent with PostgreSQL's "the server figures it out" posture.
Hou's objections to auto-maintenance:
ALTER FUNCTION ... PARALLEL UNSAFEwould need to lock arbitrary user tables to update theirrelparalleldml— surprising behavior for function DDL- Visibility gap: concurrent
CREATE TABLE(uncommitted) referencing a function won't be seen byALTER FUNCTION's dependency walk - Complex locking obligations spread across many DDL commands that currently don't lock functions
Tomas's rebuttals: Lock cost is acceptable for rare schema-change operations; skip locks when computed safety doesn't actually change; DROP FUNCTION already walks dependents. He explicitly rejects "overly-complicated solutions for a problem that almost never happens in production."
Tuplestore Materialization vs. Interleaved Execution
Tomas proposed a fundamentally simpler architecture: run the parallel SELECT to completion into a tuplestore, exit parallel mode, then INSERT serially from the tuplestore. This eliminates:
- XID-in-workers problems (no writes during parallel mode)
- Target-relation parallel-safety checks (only SELECT safety matters)
- Complex parallel-mode scoping issues at ExecutePlan level
Hou's concession: He explicitly stated "I don't oppose the alternative idea" and acknowledged the tuplestore infrastructure would remain useful even after future parallel-INSERT lands (needed for unsafe-marked tables). This was a meaningful softening from his earlier defense of the interleaved approach.
EPQ as a Fundamental Barrier Beyond INSERT
Tomas's most substantive new contribution: EvalPlanQual (READ COMMITTED row rechecking) blocks extending parallel DML to UPDATE/DELETE/MERGE. INSERT is special because there's no existing row to recheck. With materialization, the parallel workers and plan state are gone when EPQ would need to inject tuples for rechecking.
Five speculative approaches were enumerated (full parallel EPQ, EPQ-safe plan identification, leader-only EPQ with separate plan, no EPQ with full-statement retry, SERIALIZABLE-only) — none deemed clean solutions. This strengthens the case for treating INSERT as a self-contained deliverable rather than a stepping stone.
Feature Decomposition
Tomas explicitly separated:
- (a) INSERT + PARALLEL SELECT — what this patch delivers (SELECT runs in parallel, INSERT in leader)
- (b) PARALLEL INSERT + SELECT — future work with actual writes in workers
He argues (a) should not pay design taxes for (b), and his tuplestore proposal does not make (b) harder.
Current Patch Structure (v10)
- 0001 — DDL syntax,
pg_class.relparalleldmlcolumn,\dt+support, reject for foreign/temp tables - 0002 — Planner changes: consult flag in
is_parallel_allowed_for_modify(); leader-assigned XID - 0003 —
pg_get_table_parallel_dml_safety()/pg_get_table_max_parallel_dml_hazard()helper SRFs - 0004 — Regression tests
Runtime function-check piece was split to a separate thread due to FmgrBuiltin redesign implications.
Status at Month End
- Thread appears to be converging toward Tomas's simpler tuplestore approach for scoped (a)-only delivery
- No concrete patch implementing the tuplestore approach has been posted
- No committer has weighed in on whether the tuplestore-based approach is an acceptable scoped deliverable
- The auto-maintenance vs. declarative debate remains unresolved in principle but may become moot if tuplestore approach is adopted (since target-relation safety checking becomes unnecessary)