COPY FROM with RLS

First seen: 2025-10-28 12:12:30+00:00 · Messages: 2 · Participants: 1

Latest Update

2026-05-11 · opus 4.7

COPY FROM with Row-Level Security: Bridging the Bulk-Load / Policy Enforcement Gap

Background and Architectural Problem

PostgreSQL's Row-Level Security (RLS) machinery is tightly coupled to the query rewriter and the ModifyTable executor node. When a user runs INSERT, UPDATE, DELETE, or MERGE against a table with RLS enabled, the rewriter (get_row_security_policies in rewrite/rowsecurity.c) expands applicable policies into WithCheckOption (WCO) nodes attached to the Query tree. The planner lowers these onto ModifyTable.withCheckOptionLists, and at execution time InitResultRelInfo / ExecInitPartitionInfo populates ResultRelInfo.ri_WithCheckOptions (the raw expression) and ri_WithCheckOptionExprs (the compiled ExprState). ExecWithCheckOptions() is then invoked per row just before ExecConstraints() in ExecInsert/ExecUpdate.

COPY FROM, however, deliberately short-circuits this pipeline. It does not build a Query, it does not invoke the rewriter, and it does not construct a ModifyTable plan node. It builds a minimal ResultRelInfo directly in copyfrom.c, opens the relation, runs BeginCopyFrom, and feeds tuples through table_tuple_insert (or multi-insert buffering). Consequently, WITH CHECK policies declared via CREATE POLICY ... WITH CHECK (...) are never evaluated on the COPY path. The current code sidesteps the issue by requiring that the invoking role have the baseline INSERT privilege and by honoring SELECT policies on COPY TO via the existing rewriter route, but there has been no mechanism to enforce INSERT-time WCO quals on bulk load.

This is both a correctness/consistency issue (two INSERT-shaped paths with divergent security semantics) and a usability issue: administrators who design RLS policies to constrain what rows a role can insert cannot trust COPY FROM to respect them, forcing awkward workarounds (revoke COPY, funnel through INSERT, use \copy + server-side triggers, etc.).

The Proposed Approach

Jian He's patch tackles this by wiring the standard RLS plumbing into CopyFrom. The conceptual steps mirror what transformInsertStmt + planner + executor would do for an ordinary INSERT:

  1. Call get_row_security_policies() to harvest the applicable WCO list for the target relation.
  2. Compile each WCO expression into an ExprState and attach them to resultRelInfo->ri_WithCheckOptions and ri_WithCheckOptionExprs.
  3. Invoke ExecWithCheckOptions(WCO_RLS_INSERT_CHECK, ...) in the CopyFrom per-tuple loop, positioned just before ExecConstraints() to mirror ExecInsert's ordering (so that CHECK constraints and NOT NULL still run even if policies pass, and so that an RLS failure is raised with the same priority as in the INSERT path).

The partitioning complication

ExecInitPartitionInfo — the routine that lazily builds per-leaf-partition ResultRelInfos during tuple routing — reads WCO lists off the parent ModifyTable node (node->withCheckOptionLists) and re-maps them through the partition's attribute map. This is the only supported way to propagate WCOs onto dynamically-opened partition ResultRelInfos. To reuse it, Jian's patch synthesizes a minimal ModifyTable PlanState whose sole purpose is to carry withCheckOptionLists. The author flags this as "feels like a hack, but since we only care about ModifyTable.withCheckOptionLists so the implication is limited." Architecturally this is the cleanest available seam: the alternative would be to duplicate the partition-WCO mapping logic in copy code, which would create a maintenance hazard every time RLS / partitioning semantics change.

The subquery-in-policy problem

RLS policies can contain arbitrary expressions including SubLinks (e.g., USING (tenant_id IN (SELECT id FROM allowed_tenants))). A bare ExprState evaluated via ExecWithCheckOptions cannot execute SubPlans unless those SubPlans have been through the planner (which turns SubLink into SubPlan/InitPlan and registers them on the surrounding PlanState.subPlan list) and unless there is an enclosing EState/PlanState properly initialized with a param list.

The initial (2025-10-28) patch simply errors out when any applicable policy contains a sublink — a pragmatic but significant functional gap, since subquery-based policies are common in multi-tenant designs.

The follow-up (2026-05-10) patch removes that restriction with a novel mechanism: it constructs a dummy INSERT INTO "schema"."rel" DEFAULT VALUES statement, runs it through pg_analyze_and_rewrite_fixedparams()pg_plan_query()CreateQueryDesc()ExecutorStart(), and then deliberately never calls ExecutorRun(). The resulting QueryDesc/EState/ModifyTableState tree — with fully-planned SubPlans, an initialized parameter ecosystem, and properly populated ResultRelInfos (including partitioned ones) — is commandeered by CopyFrom, which feeds its file-sourced tuples into the borrowed ResultRelInfo and calls ExecWithCheckOptions with a fully functional execution context.

This is a striking design: it effectively treats the planner/executor startup machinery as a library for building an RLS evaluation harness, decoupling it from the question of where tuples come from. For COPY TO with RLS the patch takes an analogous but simpler route — rewriting COPY rel TO ... into COPY (SELECT ... FROM rel) TO ... so the existing SELECT-policy rewrite path fires naturally.

Technical Tradeoffs and Open Questions

  1. Semantic fidelity vs. layering violation. The dummy-INSERT approach guarantees that every corner of RLS semantics — SubLinks, volatile functions in policy quals, LEAKPROOF considerations, security-barrier handling, CURRENT_USER/SESSION_USER under SET ROLE, policies combined with ALL/PERMISSIVE/RESTRICTIVE logic — is handled identically to real INSERT, because it is a real INSERT (minus the ExecutorRun). The cost is that COPY now pays planner overhead (once per COPY, amortized) and has a second execution state object to manage alongside its own. Memory-context lifetime and snapshot management between the two states need careful auditing.

  2. BEFORE INSERT triggers, generated columns, defaults. COPY FROM already has its own handling for these; with a synthetic ModifyTableState in play, care is needed to ensure these run exactly once and in the right order. DEFAULT VALUES in the dummy query implies the planner will construct default expressions for all columns — these must not be accidentally evaluated during COPY's per-row work.

  3. Multi-insert batching. COPY FROM heavily optimizes with table_multi_insert. WCO evaluation must happen before a tuple enters the multi-insert buffer (since a failure should abort before any rows of that batch are flushed, and should attribute the error to the specific offending row). This mirrors how ExecConstraints is already sequenced in the COPY loop.

  4. Statistics / pg_stat_statements. Running a dummy INSERT through analyze+plan may produce spurious planner cache entries or statistics artifacts unless explicitly suppressed.

  5. Ordering relative to BEFORE ROW triggers. In ExecInsert, WCOs are checked after BEFORE ROW triggers fire (so that a trigger can transform a row into one that passes policy). The COPY patch must preserve this ordering; placing ExecWithCheckOptions "right above ExecConstraints" is consistent with this only if BEFORE triggers have already executed by that point in the COPY loop — which they do.

  6. INSTEAD OF triggers and foreign tables are correctly excluded by Jian — RLS only applies to plain and partitioned tables, and COPY's target-kind checks already prevent the other cases.

Assessment

The feature is genuinely desirable and closes a real security-consistency gap. The first patch's approach (direct WCO harvesting + synthetic ModifyTable for partition routing) is architecturally defensible but incomplete because of the subquery restriction. The second patch's "plan a dummy INSERT, steal its state, skip ExecutorRun" trick is clever and comprehensive but raises legitimate questions about layering — it's the kind of design that committers tend to push back on, preferring instead either (a) refactoring get_row_security_policies and the WCO initialization code into a reusable helper that COPY can call without needing a fake Query, or (b) teaching COPY to build a proper (non-dummy) plan tree that represents what it's actually doing. As of the available messages there is no committer response on-thread, so the design is still in "proposal" stage.