Parallel INSERT SELECT take 2

First seen: 2021-04-12 01:21:57+00:00 · Messages: 48 · Participants: 8

Latest Update

2026-06-01 · claude-opus-4-6

Parallel INSERT SELECT take 2 — May 2026 Monthly Summary

Overview

After five years of dormancy, this thread was revived by Tomas Vondra with a fundamental reassessment of the approach taken by the Fujitsu team (Hou, Bharath, Greg Nancarrow) to parallelize the SELECT side of INSERT ... SELECT. The month saw intense debate between two competing philosophies — user-declared parallel safety vs. auto-maintained catalog flags — and culminated in a potential convergence toward a simpler tuplestore-based materialization approach, though no new patch was posted.

Key Technical Debates

Declarative vs. Auto-Maintained Parallel Safety

The original patch series (v1–v10) introduced pg_class.relparalleldml as a user-declared flag (CREATE/ALTER TABLE ... PARALLEL DML {UNSAFE|RESTRICTED|SAFE}), analogous to proparallel on functions. This eliminates planning-time overhead for partitioned tables but shifts correctness responsibility to users.

Tomas's counter-proposal: Auto-maintain relparalleldml by cascading updates when triggers, defaults, constraints, or referenced functions change. He argues table safety is fully decidable from the catalog (unlike function safety), making user-declaration philosophically inconsistent with PostgreSQL's "the server figures it out" posture.

Hou's objections to auto-maintenance:

  1. ALTER FUNCTION ... PARALLEL UNSAFE would need to lock arbitrary user tables to update their relparalleldml — surprising behavior for function DDL
  2. Visibility gap: concurrent CREATE TABLE (uncommitted) referencing a function won't be seen by ALTER FUNCTION's dependency walk
  3. Complex locking obligations spread across many DDL commands that currently don't lock functions

Tomas's rebuttals: Lock cost is acceptable for rare schema-change operations; skip locks when computed safety doesn't actually change; DROP FUNCTION already walks dependents. He explicitly rejects "overly-complicated solutions for a problem that almost never happens in production."

Tuplestore Materialization vs. Interleaved Execution

Tomas proposed a fundamentally simpler architecture: run the parallel SELECT to completion into a tuplestore, exit parallel mode, then INSERT serially from the tuplestore. This eliminates:

Hou's concession: He explicitly stated "I don't oppose the alternative idea" and acknowledged the tuplestore infrastructure would remain useful even after future parallel-INSERT lands (needed for unsafe-marked tables). This was a meaningful softening from his earlier defense of the interleaved approach.

EPQ as a Fundamental Barrier Beyond INSERT

Tomas's most substantive new contribution: EvalPlanQual (READ COMMITTED row rechecking) blocks extending parallel DML to UPDATE/DELETE/MERGE. INSERT is special because there's no existing row to recheck. With materialization, the parallel workers and plan state are gone when EPQ would need to inject tuples for rechecking.

Five speculative approaches were enumerated (full parallel EPQ, EPQ-safe plan identification, leader-only EPQ with separate plan, no EPQ with full-statement retry, SERIALIZABLE-only) — none deemed clean solutions. This strengthens the case for treating INSERT as a self-contained deliverable rather than a stepping stone.

Feature Decomposition

Tomas explicitly separated:

He argues (a) should not pay design taxes for (b), and his tuplestore proposal does not make (b) harder.

Current Patch Structure (v10)

Runtime function-check piece was split to a separate thread due to FmgrBuiltin redesign implications.

Status at Month End

History (1 prior analysis)
2026-06-01 · claude-opus-4-6

New in this round: External patch from Joel Jacobson et al. addresses the function-locking concurrency gap

Joel Jacobson (joel@compiler.org) posts from a separate thread [1], offering a patch that directly addresses the concurrency problem Hou raised earlier — the visibility gap when ALTER FUNCTION races against concurrent DDL that creates dependencies on that function.

What the patch does

0001: "Serialize routine definition changes with dependency recording"

  • Takes AccessExclusiveLock on the pg_proc object (the function's OID) before CREATE OR REPLACE FUNCTION updates an existing routine, and before ALTER FUNCTION changes routine properties.
  • Dependency recording (pg_depend insertion) now locks referenced objects before inserting rows.
  • After acquiring the lock and waiting, the code refetches the pg_proc tuple and repeats ownership/object-kind checks (handling the case where the function was dropped or reassigned while waiting).

The change is minimal: 51 lines across pg_proc.c and functioncmds.c.

Why this is relevant to the parallel INSERT SELECT discussion

This patch provides exactly the serialization mechanism Hou described as "technically possible but complex" — locking the function during DDL that references it, so that ALTER FUNCTION ... PARALLEL UNSAFE cannot race with concurrent table/index creation that embeds that function in an expression. Specifically:

  1. Closes Hou's visibility gap: If ALTER FUNCTION takes AccessExclusiveLock on the pg_proc entry, and dependency recording (during CREATE TABLE/INDEX) takes a lock on the same entry before inserting pg_depend, the two operations serialize. The auto-maintained relparalleldml approach becomes viable because ALTER FUNCTION can now safely walk all committed dependents knowing no new ones are being concurrently created.

  2. Minimal footprint: The patch does not spread locking across "many DDL commands that currently don't lock functions" (Hou's objection) — instead it centralizes the lock at the point where dependencies are recorded, which is already a single code path.

  3. Pre-existing bug fix: The patch also fixes the existing race Hou noted where DROP FUNCTION concurrent with CREATE TABLE (referencing the function in a default) results in "cache lookup failed for function."

Strategic implication

If this patch (or something like it) is accepted independently, it materially weakens Hou's primary technical objection to Tomas's auto-maintained relparalleldml proposal. The "complex locking" that Hou argued made auto-maintenance unattractive is reduced to a small, already-implemented change. This potentially tips the balance toward auto-maintenance over the declarative/user-labeled approach.

What remains unresolved

  • No response yet from Hou or Tomas on whether this patch fully addresses the concurrency scenario.
  • The patch comes from a separate thread and may have its own review lifecycle; it's unclear whether it would be committed as a prerequisite or independently.
  • The broader architectural question (tuplestore vs. interleaved, INSERT-only vs. general DML) remains open.