Row pattern recognition

First seen: 2023-06-25 12:05:09+00:00 · Messages: 363 · Participants: 15

Latest Update

2026-06-04 · claude-opus-4-6

New in this round: varId encoding decision settled, Jian's executor questions answered, empty-pattern rationale documented

1. varId Encoding: Option (3) Chosen — Reserve 16 Control Codes at 0xF0 Boundary

Henson raised the discovery that varId 251 is a "hole" — never assigned but classified as a variable by the macro. He presented three options for the uint8 varId space:

(1) Leave the hole as a documented reserve (status quo + docs)
(2) Fill it as a 252nd variable (spend the last free slot)
(3) Reserve 16 control codes (vars 0..239, control 240..255) — costs 12 variable slots but provides 12 spare control codes for future sentinels (anchors, exclusions) while keeping match history at 1 byte per row

Decision: (3) was unanimously chosen (Ishii, Jian, Henson all agree). The key argument: this is a narrowing (240 max vars instead of 251), which is only ABI-compatible pre-release. Post-release it would be a breaking change. 240 variables far exceeds any realistic pattern. The change goes into v48 core.

Jian added a supporting rationale: PostgreSQL regex uses 14 special characters, so reserving comparable headroom for future control codes is future-proof.

2. Jian's Executor Review Items — Three Responses from Henson

Double-free concern (matchedState vs ctx->states): Henson explains they are disjoint by construction. nfa_advance() detaches ctx->states and rebuilds from scratch; the state reaching FIN is moved to matchedState and never re-added to the list. ExecRPRFreeContext frees both sets without overlap.

continue after frame-boundary forced mismatch: Henson explains it is NOT dead code. Without it, a finalized context would be matched a second time in the same row — a "context-matched-twice" error. It appears harmless in regression tests only because the forced mismatch already emptied ctx->states, making the second iteration a no-op. But it would also re-run nfa_reevaluate_dependent_vars(), overwriting shared nfaVarMatched state that other contexts read.

README.rpr XII-4 Memory Pool section: Jian suggested simplifying to just state that RPRNFAState/RPRNFAContext are allocated in a partition-lifespan memory context destroyed in release_partition. Henson agrees and will replace the rationale list with that wording.

3. Empty Pattern Left Out of Scope — Formal Rationale Published

Henson posted a detailed analysis of why PATTERN (()) stays a syntax error. The argument:

Case 1 (whole pattern empty): Rejected by existing DEFINE rules — DEFINE is mandatory, cannot be empty, and every DEFINE variable must appear in PATTERN. Zero-variable patterns cannot satisfy this.
Case 2 (embedded empty): The empty pattern is the identity element under concatenation (A () B = A B), removable by existing AST simplification. If the fold removes everything, falls to Case 1.
Empty alternative (X | ()): Rewrites to (X)? — an existing nullable construct. Branch order maps to greediness.

None of these would require a new executor-level element. Supporting it would need: a grammar production, an empty AST node, concatenation-identity deletion in the SEQ pass, and optionally extended quantifier composition — all new work with no practical benefit.

4. Quantifier Composition Over-Rejection Documented

As an aside, Henson documents that tryMultiplyQuantifiers in the AST optimizer conservatively rejects some safe folds. The full foldable class is (X{p,q}){m,n} where reachable counts are contiguous (no gaps). The current guard only folds when outer is exact, child is {1,1}, or both are unbounded. Nullable children (p=0) and span-bridging children are safe but currently left as nested groups. This is explicitly flagged as low-priority optimization, not a correctness issue. Examples verified on the branch via EXPLAIN output.

5. Patch Organization: README.rpr Moving to Docs Patch

Ishii suggested moving README.rpr from patch 0005 (executor) to patch 0006 (docs) for cleaner categorization by type (code/docs/tests). Henson immediately agreed (+1).

6. Minor Items

nfa_reevaluate_dependent_vars() missing forward declaration and misplaced in "API exposed to nodeWindowAgg.c" section — Henson will fix
Old-style ereport() with outer parens in 0025 — already fixed in 0007 for most, Henson will update 0025
Non-ASCII character issue from prior round confirmed already handled in nocfbot-0004

History (3 prior analyses)

2026-06-01 · claude-opus-4-6

New in this round: Ishii flags non-ASCII character corruption in v47-0002 patch comments

A single brief message from Ishii (2026-06-01) noting a cosmetic/encoding issue: the v47-0002 patch contains replacement characters ("�") where Unicode section signs (§) should appear in comments referencing ISO spec sections (§6.5 and §4.16).

This is a minor patch hygiene issue — the section-sign characters were likely corrupted during patch generation or email transport (UTF-8 → Latin-1 mismatch or similar). No technical arguments, no architectural discussion, no position changes.

No substantive progress

No new patches, no technical debate, no design decisions. This is a minor formatting/encoding bug report on existing patch text.

2026-06-01 · claude-opus-4-6

Row Pattern Recognition — May 2026 Monthly Summary

Overview

May 2026 saw the RPR patch set move from v47 toward v48, with activity focused on polish, standards compliance, and code review rather than architectural changes. The core NFA engine, navigation redesign, and planner guards from earlier months remained stable. Three main threads of work advanced: Henson Choi's 15-patch follow-up series addressing cleanup and deferred items, Ishii's documentation and correctness contributions, and Jian He's first deep NFA code review.

Key Developments

1. Recursive-CTE Prohibition (B7) — Resolved

The longest-standing deferred item was closed. Ishii posted the relevant ISO/IEC 19075-5 §6.17.5 text confirming that "row pattern matching is prohibited in recursive queries." Henson then implemented the rejection in transformWithClause(), operating on the raw parse tree before per-CTE analysis. Key decisions from Ishii's review:

Error code changed from ERRCODE_FEATURE_NOT_SUPPORTED to ERRCODE_SYNTAX_ERROR — the standard prohibits the combination (permanently illegal), not merely unimplemented.
CREATE RECURSIVE VIEW is already rewritten to WITH RECURSIVE by makeRecursiveViewSelect(), so a single check covers both forms.
A new RPCommonSyntax.location field provides PATTERN keyword position for error messages.

2. DEFINE Walker Unification and Semantic Tightening

Four separate DEFINE walkers (spread across parser/planner/executor) were collapsed into a single phase-tagged traversal. On this unified base:

Volatile functions and NextValueExpr are now rejected at parse-analysis time
Runtime-constant offset checking for navigation is folded into the same walker
Subquery rejection is documented as intentionally stricter than the standard, with a future relaxation path recorded in comments

3. Standards Citation Normalization

Patch 0011 standardizes how the codebase references the SQL specification:

Canonical reference is now ISO/IEC 19075-5 (the Technical Report), not 9075-2
Chapter 6 (WINDOW / R020) cited before Chapter 4 (FROM / R010)
"STR06" pinned to §7.2.8
A normative-reference paragraph added to executor/README.rpr

4. Raw Tree Walker Gap Fixed

Ishii identified that raw_expression_tree_walker_impl() did not visit RPCommonSyntax or its children. While no current code path depends on this, the fix ensures future walker consumers won't silently skip RPR content. Henson validated with debug_raw_expression_coverage_test on an assert-enabled build.

5. Jian He's First Detailed NFA Review

Jian delivered a thorough review of execRPR.c covering 10 distinct issues:

Early exit optimization: Sequential patterns like (A B C D) should short-circuit on intermediate failure
ExecRPRFinalizeAllContexts possibly dead code: Only affects EXPLAIN output, not query correctness
RPRNFAState->next not NULLed after state field override — potential dangling pointer
Greedy/non-greedy comment confusion in grouped quantifier handling
nfa_add_state_unique return value unused — dead code or missing error handling
Visited-bitmap purpose unclear in nfa_add_state_unique
compareDepth = elem->depth + 1 rationale unexplained
Several documentation drift issues (stale initialAdvance reference)

Henson acknowledged most items will be accepted, with responses planned as numbered patches in v48.

6. Roadmap Clarification

Ishii explicitly enumerated scope boundaries:

In scope: PATTERN, DEFINE, NFA engine
Deferred (large): MEASURES clause, range-var-qualified row pattern variables
Deferred (moderate, post-commit candidates): SEEK clause, EMPTY MATCH, subquery relaxation in DEFINE

Minor/Housekeeping Items

High-water-mark NFA visited-bitmap reset changed from bulk memset to O(span_words) over touched range
DEFINE qualifier tri-classification (pattern var / range var / fall-through) improves error messages
Trailing enum commas, ereport() outer-paren removal, duplicate #include cleanup
README.rpr expanded with field descriptions, cross-references, and formal References section

Status at Month End

The patch is converging on v48, which will incorporate Ishii's walker fix, Henson's 0001–0015 follow-up series, and responses to Jian's review. No architectural changes are anticipated. The main open work is addressing Jian's NFA review items and finalizing documentation. Off-list collaboration between Jian and Henson is ongoing.

2026-06-01 · claude-opus-4-6

New in this round: Henson responds to Ishii on two architectural/naming topics + a static-revert fix

Two messages from Henson (2026-06-01) responding to Ishii's review notes (the Ishii messages themselves are not shown but their content is reconstructible from Henson's replies).

1. RPRContext consolidation deferred but framed against R010 (MATCH_RECOGNIZE)

Ishii apparently raised (or relayed from Jian) a suggestion to consolidate RPRContext fields. Henson agrees but explicitly defers it past v48, framing it as the structural prerequisite for R010 support:

"Rather than hold RPRContext back until then, I'd do the consolidation at a sensible point after v48 but shape it against R010 (what is shared versus per-context, how the fields group), so one engine core can later back both the R020 window path and R010 without a second reshape."

This is the first time the thread explicitly identifies the RPRContext reshape as the architectural bridge between R020 (window) and R010 (FROM/MATCH_RECOGNIZE). Previously R010 was just listed as "out of scope"; now there's a concrete design principle: the consolidation should be shaped so one engine core serves both features.

Henson credits Jian with the reframing insight — what looked like a code-size win is actually the shared-engine-core move that opens the R010 path.

2. "Prefix/suffix merging" terminology acknowledged as provisional

Ishii suggested "flattening" or "reduction" as better terminology for what's currently called "prefix/suffix merging" in the NFA compile phase. Henson agrees the current name isn't defensible on the merits but keeps it for v48 as a stopgap to contain ripple — renaming it would force renaming sibling Phase-1 rewrites ("consecutive variable / group / ALT merging") for consistency.

The interesting disclosure: Henson is preparing an academic paper on the algorithm with a university research group, and the settled terminology will come from that work. This is the first mention of formal academic publication for the NFA engine design.

3. `findTargetlistEntrySQL99()` reverted to static — with architectural history

Ishii caught that findTargetlistEntrySQL99() was exported as extern in parse_clause.h but no longer has cross-file callers. Henson confirms the revert and provides the full history of why it went extern:

Original design: DEFINE processing in parse_rpr.c called findTargetlistEntrySQL99() cross-file to add DEFINE expressions to the targetlist as resjunk entries.
The SIGSEGV it caused: When an RPR window and a plain window coexist, the non-RPR WindowAgg inherited targetlist entries containing RPRNavExpr nodes it cannot evaluate → crash.
The fix: Add only the Vars referenced by DEFINE (not the full expression), with an allpaths.c guard preventing column pruning. This eliminated the cross-file call, making the extern dead.

Henson also identifies a residual optimization gap: the allpaths.c guard is deliberately coarse — it also prevents removing a WindowAgg whose RPR WindowFuncs are all unused. Fixing this precisely requires restructuring remove_unused_subquery_outputs(), which touches all subqueries, not just RPR ones. This is deferred as a longer-term item.

No other activity

No responses from Jian or other reviewers. v48 has not yet been posted.

Row pattern recognition

Latest Update

New in this round: varId encoding decision settled, Jian's executor questions answered, empty-pattern rationale documented

1. varId Encoding: Option (3) Chosen — Reserve 16 Control Codes at 0xF0 Boundary

2. Jian's Executor Review Items — Three Responses from Henson

3. Empty Pattern Left Out of Scope — Formal Rationale Published

4. Quantifier Composition Over-Rejection Documented

5. Patch Organization: README.rpr Moving to Docs Patch

6. Minor Items

New in this round: Ishii flags non-ASCII character corruption in v47-0002 patch comments

No substantive progress

Row Pattern Recognition — May 2026 Monthly Summary

Overview

Key Developments

1. Recursive-CTE Prohibition (B7) — Resolved

2. DEFINE Walker Unification and Semantic Tightening

3. Standards Citation Normalization

4. Raw Tree Walker Gap Fixed

5. Jian He's First Detailed NFA Review

6. Roadmap Clarification

Minor/Housekeeping Items

Status at Month End

New in this round: Henson responds to Ishii on two architectural/naming topics + a static-revert fix

1. RPRContext consolidation deferred but framed against R010 (MATCH_RECOGNIZE)

2. "Prefix/suffix merging" terminology acknowledged as provisional

3. findTargetlistEntrySQL99() reverted to static — with architectural history

No other activity

3. `findTargetlistEntrySQL99()` reverted to static — with architectural history