POC: PLpgSQL FOREACH IN JSON ARRAY

First seen: 2026-02-28 07:10:53+00:00 · Messages: 18 · Participants: 6

Latest Update

2026-05-27 · claude-opus-4-6

Technical Analysis: PLpgSQL FOREACH IN JSON ARRAY

Core Problem

PL/pgSQL currently lacks a native iteration construct for JSON arrays. Developers who need to iterate over JSON array elements must use FOR IN SELECT FROM json_array_elements(...), which invokes the full SQL executor machinery for each iteration. This introduces substantial overhead—context switches between PL/pgSQL and the SQL executor, SPI setup/teardown, and tuple table management—for what is conceptually a simple loop over in-memory data.

The performance penalty is significant: benchmarks show a 3-4x slowdown compared to the proposed native iteration. This matters architecturally because JSON processing is increasingly central to PostgreSQL workloads, and PL/pgSQL is the dominant procedural language. The gap between "iterate over a native array" (which has FOREACH IN ARRAY) and "iterate over a JSON array" (which requires a full query) is an ergonomic and performance inconsistency.

Proposed Solution: Initial Patch

Pavel Stehule proposes a new PL/pgSQL statement:

FOREACH <var_list> IN JSON ARRAY <expression>
LOOP
  <statements>
END LOOP;

Key Design Decisions

  1. Name-based field assignment for JSON objects: Unlike FOREACH IN ARRAY which assigns by position, JSON object fields are mapped to PL/pgSQL variables by name. This is necessitated by jsonb's lack of field ordering guarantees—positional assignment would be non-deterministic.

  2. Uses json_populate_type internally: The implementation leverages existing JSON-to-type coercion infrastructure for mapping JSON values to PL/pgSQL variables, which handles type casting naturally.

  3. Memory management: A temporary memory context (tmp_cxt) is created for per-iteration allocations and reset after each iteration. The context itself is cleaned by the statement-level memory context reset, not explicitly destroyed.

  4. RECORD type limitation: Record variables require a pre-assigned type structure before use in the FOREACH statement. This is because json_populate_type needs a known TupleDesc to map fields. Supporting truly untyped records would require creating a new TupleDesc per iteration (potentially expensive with varying JSON object shapes).

  5. Lax vs. Strict mode: The initial implementation uses behavior consistent with jsonb_populate_record—cast errors are raised (effectively strict for type mismatches), but missing fields result in NULLs rather than errors. Pavel acknowledges this is an open design question, noting SQL/JSON standards use lax semantics by default.

Architectural Evolution: Subscript Routines Iterator Interface

The most significant architectural development in this thread is the pivot from a JSON-specific implementation to a generic iteration interface via SubscriptRoutines.

Peter Eisentraut's Insight

Peter Eisentraut suggested the feature shouldn't hardcode JSON arrays specifically, but instead use an iteration helper function that any type could provide. Tom Lane strongly endorsed this (+1) and proposed extending struct SubscriptRoutines to offer optional support functions for iterating through all elements of a subscriptable object.

The Iterator Interface (v2 patch)

Pavel implemented this as two new methods in the subscripting infrastructure:

struct _ForeachAIterator {
    bool (*iterate)(ForeachAIterator *self,
                    Datum *value, bool *isnull,
                    Oid *typid, int32 *typmod);
};

typedef ForeachAIterator * (*CreateForeachAIterator)(
    Datum value, Oid typid, int32 typmod,
    int slice, Oid target_typid, int32 target_typmod);

This is added to SubscriptRoutines:

typedef struct SubscriptRoutines {
    // ... existing fields ...
    CreateForeachAIterator create_foreach_a_iterator;
} SubscriptRoutines;

Architectural Implications

  1. Extensibility: Any type that implements subscripting (arrays, jsonb, hstore, custom types) can now provide iteration support for PL/pgSQL FOREACH. This is a significant extension point.

  2. Iterator pattern: The ForeachAIterator struct uses a C-level object-oriented pattern with a vtable-style function pointer. Private iterator state can be appended after the struct (noted in the comment: "Private fields might appear beyond this point...").

  3. Lifetime semantics: The iterator's iterate method is called under a short-lived memory context that is reset after each call, meaning returned Datums must be considered ephemeral. The CreateForeachAIterator routine must copy its input value to the statement context for the iterator's lifetime.

  4. Slice support: The slice parameter in CreateForeachAIterator allows multi-dimensional iteration (consistent with existing array FOREACH slice semantics), and target_typid/target_typmod enable type-aware iteration.

  5. Unified FOREACH path: This likely means the existing FOREACH IN ARRAY implementation could eventually be refactored to use the same iterator interface, unifying the code paths.

Open Design Questions

  1. Lax vs. Strict semantics: Should missing JSON fields raise errors or produce NULLs? Should type cast failures be errors or produce defaults? The current implementation follows jsonb_populate_record behavior (permissive on missing fields, strict on type casts).

  2. Syntax naming: The statement uses JSON ARRAY in the syntax. With the generic iterator interface, should there be a more general syntax, or should different types have their own keywords?

  3. Performance of the generic interface: Adding an indirection layer (virtual function calls through the iterator interface) versus the original direct jsonb iteration—whether this introduces measurable overhead needs verification.

Code Metrics

Review Quality Assessment

The patch received functional testing from Jim Jones (confirming 3x performance improvement), code review from Chao Li (finding real bugs: typo "a object" → "an object", ereport formatting issues, questioning memory management), and build verification from Shreeya Sharma. However, the "Ready for Committer" status was set before the significant architectural redesign requested by Eisentraut/Lane, making that status effectively obsolete.

The thread demonstrates a common pattern in PostgreSQL development: a feature-specific patch is generalized into infrastructure improvement at the request of senior committers, resulting in broader utility but delayed completion.