Function scan FDW pushdown

First seen: 2026-03-18 12:08:49+00:00 · Messages: 2 · Participants: 1

Latest Update

2026-05-06 · opus 4.7

Function Scan FDW Pushdown — Technical Analysis

Context and Core Problem

This thread concerns a long-running patch series by Alexander Pyhalov (Postgres Pro) to extend postgres_fdw — and the FDW infrastructure more broadly — so that function RTEs (RangeTblEntry of kind RTE_FUNCTION) can participate in pushdown to a remote server. Today, when a query joins a foreign table against a set-returning function (SRF) such as generate_series(), unnest(), or a user-defined function, the planner has no way to express the function scan on the remote side. The function is always executed locally, and the join must be executed locally as well, which forces large foreign relations to be fetched back to the coordinator even when the join cardinality would be small.

Architecturally the problem sits at the intersection of three subsystems:

  1. Path generation for joins involving non-base relations. postgres_fdw's existing pushdown machinery (foreign_join_ok, deparseSelectStmtForRel) assumes both sides of a join are base foreign relations (or already-pushed-down joins) belonging to the same user mapping / server. A FUNCTION RTE is neither a base rel nor a foreign rel — it has no ForeignScan plan node and no PgFdwRelationInfo. Integrating it requires the "asymmetric join" concept (local-side relation pushed into a remote join), a topic previously explored by Pyhalov in the asymmetric partition-wise join and "pushdown of constants/VALUES" patch series.

  2. Deparsing a function expression as a remote FROM item. The deparser must emit a ROWS FROM (...) / function-call clause that is safe to execute remotely. That requires: (a) the function must be known to exist with identical semantics on the remote (the usual check_shippable/is_shippable gating via extensions option), (b) its argument expressions must themselves be shippable, and (c) the function must not depend on executor-only state such as Param nodes referring to outer plan values, because those cannot be resolved inside the remote SQL text.

  3. Plan-tree bookkeeping across set_plan_references. Once the planner rewrites varnos into OUTER_VAR/INNER_VAR and compacts the rangetable, the FDW must still be able to reconstruct which function RTE was folded into which ForeignScan. This is the "tricky part" the author explicitly calls out.

What the Patch Does

From the two messages we see three concrete design choices that have solidified by the March 2026 revision:

1. Restricting which functions are pushable

The patch now forbids pushdown for:

The composite-type restriction is a deparser/tuple-descriptor safety measure: a remote ROWS FROM (f(...)) result must be projected into a TupleDesc that matches the local expectation, and composite-returning SRFs expand into multiple columns whose types and names come from the function's declared signature. Round-tripping that through a remote query — where the remote server may resolve the composite type differently, especially if the function is polymorphic or returns RECORD with a column definition list — is fragile. Disallowing it is the conservative correct choice for a first commit.

The Param-in-arguments restriction matters because function RTE arguments are evaluated once per scan rescan, typically driven by outer tuples in a nested loop (LATERAL). If those arguments contain PARAM_EXEC nodes, the value lives in the executor's es_param_exec_vals and cannot be textually inlined into remote SQL at plan time. Supporting this would require parameterized remote execution (sending parameter values on each rescan), which is a substantially larger project touching postgresGetForeignPlan, create_cursor, and the prepared-statement reuse path. Excluding it keeps the patch tractable.

2. Storing function metadata in ForeignScan.fdw_private

Pyhalov and Gleb Kashkin moved the per-function bookkeeping into the ForeignScan's private data. Previously (presumably) it lived on PgFdwRelationInfo or a side structure keyed by RTI, which breaks after set_plan_references renumbers and removes unused RTEs. By serializing the information into fdw_private (which goes through _copyForeignScan/_outForeignScan and survives setrefs intact as a List * of Nodes), the executor-time deparse and the EXPLAIN path can still identify the functions that were folded into the scan.

The "correct rti mappings after setrefs" comment implies they keep either the original RTIs or a mapping list, and re-resolve them during postgresBeginForeignScan / explain. This is the same pattern used for pushed-down joins, where fs_relids records the original relids.

3. Decoupling from the asymmetric-join patch

The author explicitly removed changes that were only needed for asymmetric join. This is strategically important: the asymmetric-join patch has been in review limbo for several cycles, and tying function-scan pushdown to it would block both. By narrowing scope to cases that work within the existing join-pushdown framework (i.e. joins between foreign tables where a function RTE appears as an additional FROM item that can be deparsed inline), the patch becomes independently committable. It targets PostgreSQL 20.

Key Technical Insights

Participants and Weight of Opinion

The two visible messages are both rebase/update posts from the author; no substantive technical disagreement appears in this excerpt. The direction — narrow scope, conservative restrictions, decouple from asymmetric join — reads as a direct response to reviewer guidance.

Outlook

The patch is positioned for PG20 and has been reduced to a self-contained feature. The main review risks that remain:

  1. Proving the restriction set is actually sufficient (e.g., VOLATILE functions, functions with SETOF record but fixed column list via coldeflist, polymorphic functions instantiated to composite).
  2. Cost modeling — a pushed-down function scan's rowcount/cost should still reflect local prorows, but join selectivity on the remote may differ.
  3. EXPLAIN output clarity: showing the function call inside the remote SQL rather than as a separate Function Scan node.