Technical Analysis: PLpgSQL FOREACH IN JSON ARRAY
Core Problem
PL/pgSQL currently lacks a native iteration construct for JSON arrays. Developers who need to iterate over JSON array elements must use FOR IN SELECT FROM json_array_elements(...), which invokes the full SQL executor machinery for each iteration. This introduces substantial overhead—context switches between PL/pgSQL and the SQL executor, SPI setup/teardown, and tuple table management—for what is conceptually a simple loop over in-memory data.
The performance penalty is significant: benchmarks show a 3-4x slowdown compared to the proposed native iteration. This matters architecturally because JSON processing is increasingly central to PostgreSQL workloads, and PL/pgSQL is the dominant procedural language. The gap between "iterate over a native array" (which has FOREACH IN ARRAY) and "iterate over a JSON array" (which requires a full query) is an ergonomic and performance inconsistency.
Proposed Solution: Initial Patch
Pavel Stehule proposes a new PL/pgSQL statement:
FOREACH <var_list> IN JSON ARRAY <expression>
LOOP
<statements>
END LOOP;
Key Design Decisions
-
Name-based field assignment for JSON objects: Unlike
FOREACH IN ARRAYwhich assigns by position, JSON object fields are mapped to PL/pgSQL variables by name. This is necessitated by jsonb's lack of field ordering guarantees—positional assignment would be non-deterministic. -
Uses
json_populate_typeinternally: The implementation leverages existing JSON-to-type coercion infrastructure for mapping JSON values to PL/pgSQL variables, which handles type casting naturally. -
Memory management: A temporary memory context (
tmp_cxt) is created for per-iteration allocations and reset after each iteration. The context itself is cleaned by the statement-level memory context reset, not explicitly destroyed. -
RECORD type limitation: Record variables require a pre-assigned type structure before use in the FOREACH statement. This is because
json_populate_typeneeds a knownTupleDescto map fields. Supporting truly untyped records would require creating a new TupleDesc per iteration (potentially expensive with varying JSON object shapes). -
Lax vs. Strict mode: The initial implementation uses behavior consistent with
jsonb_populate_record—cast errors are raised (effectively strict for type mismatches), but missing fields result in NULLs rather than errors. Pavel acknowledges this is an open design question, noting SQL/JSON standards use lax semantics by default.
Architectural Evolution: Subscript Routines Iterator Interface
The most significant architectural development in this thread is the pivot from a JSON-specific implementation to a generic iteration interface via SubscriptRoutines.
Peter Eisentraut's Insight
Peter Eisentraut suggested the feature shouldn't hardcode JSON arrays specifically, but instead use an iteration helper function that any type could provide. Tom Lane strongly endorsed this (+1) and proposed extending struct SubscriptRoutines to offer optional support functions for iterating through all elements of a subscriptable object.
The Iterator Interface (v2 patch)
Pavel implemented this as two new methods in the subscripting infrastructure:
struct _ForeachAIterator {
bool (*iterate)(ForeachAIterator *self,
Datum *value, bool *isnull,
Oid *typid, int32 *typmod);
};
typedef ForeachAIterator * (*CreateForeachAIterator)(
Datum value, Oid typid, int32 typmod,
int slice, Oid target_typid, int32 target_typmod);
This is added to SubscriptRoutines:
typedef struct SubscriptRoutines {
// ... existing fields ...
CreateForeachAIterator create_foreach_a_iterator;
} SubscriptRoutines;
Architectural Implications
-
Extensibility: Any type that implements subscripting (arrays, jsonb, hstore, custom types) can now provide iteration support for PL/pgSQL FOREACH. This is a significant extension point.
-
Iterator pattern: The
ForeachAIteratorstruct uses a C-level object-oriented pattern with a vtable-style function pointer. Private iterator state can be appended after the struct (noted in the comment: "Private fields might appear beyond this point..."). -
Lifetime semantics: The iterator's
iteratemethod is called under a short-lived memory context that is reset after each call, meaning returned Datums must be considered ephemeral. TheCreateForeachAIteratorroutine must copy its input value to the statement context for the iterator's lifetime. -
Slice support: The
sliceparameter inCreateForeachAIteratorallows multi-dimensional iteration (consistent with existing array FOREACH slice semantics), andtarget_typid/target_typmodenable type-aware iteration. -
Unified FOREACH path: This likely means the existing
FOREACH IN ARRAYimplementation could eventually be refactored to use the same iterator interface, unifying the code paths.
Open Design Questions
-
Lax vs. Strict semantics: Should missing JSON fields raise errors or produce NULLs? Should type cast failures be errors or produce defaults? The current implementation follows
jsonb_populate_recordbehavior (permissive on missing fields, strict on type casts). -
Syntax naming: The statement uses
JSON ARRAYin the syntax. With the generic iterator interface, should there be a more general syntax, or should different types have their own keywords? -
Performance of the generic interface: Adding an indirection layer (virtual function calls through the iterator interface) versus the original direct jsonb iteration—whether this introduces measurable overhead needs verification.
Code Metrics
- Initial patch: ~400 lines of new code
- Files modified:
pl_gram.y(parser),pl_exec.c(executor),plpgsql.h(data structures),plpgsql.sgml(documentation) - Second patch extends:
src/include/nodes/subscripting.h(core infrastructure)
Review Quality Assessment
The patch received functional testing from Jim Jones (confirming 3x performance improvement), code review from Chao Li (finding real bugs: typo "a object" → "an object", ereport formatting issues, questioning memory management), and build verification from Shreeya Sharma. However, the "Ready for Committer" status was set before the significant architectural redesign requested by Eisentraut/Lane, making that status effectively obsolete.
The thread demonstrates a common pattern in PostgreSQL development: a feature-specific patch is generalized into infrastructure improvement at the request of senior committers, resulting in broader utility but delayed completion.