Make transformAExprIn() return a flattened bool expression directly

First seen: 2026-04-24 06:06:24+00:00 · Messages: 4 · Participants: 3

Latest Update

2026-06-01 · claude-opus-4-6

Make transformAExprIn() Return a Flattened Bool Expression Directly

Core Problem

In PostgreSQL's query transformation pipeline, the function transformAExprIn() (located in src/backend/parser/parse_expr.c) is responsible for converting SQL IN clauses into internal expression trees. When an IN list contains a mix of constant values and non-constant expressions (e.g., column references from other tables), the function currently produces a nested binary tree of OR expressions rather than a flat list.

For example, given:

SELECT * FROM t1, t2, t3 WHERE t1.id IN (1, 2, t2.id, t3.id, 3);

The current behavior produces a nested structure like:

OR(OR(ScalarArrayOp(t1.id, {1,2,3}), OpExpr(t1.id = t2.id)), OpExpr(t1.id = t3.id))

This is a right-deep (or left-deep) binary tree of BoolExpr OR nodes. PostgreSQL's internal BoolExpr node is designed to hold a flat list of arguments — makeBoolExpr() takes a List *args — but transformAExprIn() constructs the result by progressively wrapping previous results in new OR nodes, creating unnecessary nesting.

Why This Matters Architecturally

The Parse-Analyze → Planner Pipeline

PostgreSQL processes queries through distinct phases: parsing → analysis/transformation → rewriting → planning → execution. The transformation phase (where transformAExprIn() lives) produces the query tree that feeds into the planner. The planner's preprocess_qual_conditions() function does flatten nested OR/AND expressions via pull_ors()/pull_ands() in prepqual.c, so the final plan is identical regardless of whether the input is nested or flat.

However, producing the flat structure directly in the parser/analyzer has several advantages:

  1. Reduced planner work: The planner doesn't need to traverse and rebuild the expression tree to flatten it. While this is not expensive for small IN lists, it represents unnecessary work that scales with the size of the IN clause.

  2. Cleaner intermediate representation: Any code that inspects the raw parse tree (query rewrite rules, pg_stat_statements normalization, debug output, extensions using post-parse hooks) sees a canonical flat structure.

  3. Principle of least surprise: BoolExpr nodes are semantically flat lists. Producing a binary tree of BoolExpr nodes during transformation is an implementation artifact, not intentional structure.

  4. Consistency: Other parts of the parser already produce flat BoolExpr lists (e.g., when transforming explicit a OR b OR c syntax). The IN-clause handling is an outlier.

Proposed Solution

V1 Patch (ChangAo Chen)

The original patch modifies transformAExprIn() to collect all the transformed sub-expressions (both the ScalarArrayOpExpr for constants and individual OpExpr nodes for non-constant elements) into a single flat list, then constructs a single BoolExpr OR node with all elements as direct children.

The key transformation changes from the iterative approach:

// Pseudo-code of current behavior:
result = NULL;
foreach(element in rexprs) {
    cmp = make_op_expr(...);
    if (result == NULL)
        result = cmp;
    else
        result = makeBoolExpr(OR, list_make2(result, cmp));
}
// Combine with ScalarArrayOp for constants
if (result && have_constants)
    result = makeBoolExpr(OR, list_make2(scalar_array_op, result));

To:

// Pseudo-code of proposed behavior:
flat_list = NIL;
if (have_constants)
    flat_list = lappend(flat_list, scalar_array_op);
foreach(element in rexprs) {
    cmp = make_op_expr(...);
    flat_list = lappend(flat_list, cmp);
}
if (list_length(flat_list) > 1)
    result = makeBoolExpr(OR, flat_list);
else
    result = linitial(flat_list);

V2 Patch (Jian Guo)

The refined v2 patch keeps all core logic inside the existing foreach(l, rexprs) loop to minimize code intrusion and preserve the function's structural integrity. It also adds test cases covering:

Technical Details

How transformAExprIn() Works

The function handles two distinct cases:

  1. Scalar IN: expr IN (val1, val2, ...) — splits the right-hand side into constants (grouped into a ScalarArrayOpExpr for efficient evaluation) and non-constants (each becoming an individual OpExpr).

  2. Row IN: (expr1, expr2) IN ((v1, v2), (v3, v4)) — each tuple comparison becomes a RowCompareExpr or conjunction of OpExpr nodes.

The patch primarily affects case 1 when there's a mix of constants and non-constants, which is when nesting occurs in the current code.

Performance Implications

The optimization is essentially zero-cost at parse time — collecting items into a list and building a single BoolExpr is actually simpler than the current approach of wrapping in nested BoolExprs. The savings come later:

The savings are modest for typical queries but the patch represents a correctness-of-representation improvement that eliminates an unnecessary normalization step.

No Behavioral Change

As demonstrated by the thread's EXPLAIN output, the final execution plan is identical before and after the patch. The planner already handles both representations. This is purely an internal representation optimization that produces a more canonical form earlier in the pipeline.

Potential Concerns

  1. Regression risk: Any code that assumes a specific nesting structure from transformAExprIn() could break. However, since the planner already flattens these, downstream code should handle flat lists.

  2. View/rule storage: Since the raw parse tree is stored for views and rules, this changes what gets stored. However, the flat form is equally valid and actually preferable.

  3. pg_dump compatibility: Expression deparsing should handle flat OR lists correctly since they're already produced by other code paths.