=?utf-8?B?RUNQRzogaW5jb25zaXN0ZW50IGJlaGF2aW9yIHdpdGggdGhlIGRvY3VtZW50?= =?utf-8?B?IGluIOKAnEdFVC9TRVQgREVTQ1JJUFRPUi7igJ0=?=

First seen: 2026-03-11 09:49:46+00:00 · Messages: 13 · Participants: 4

Latest Update

2026-06-01 · claude-opus-4-6

ECPG: Inconsistent Behavior with GET/SET DESCRIPTOR Multiple Header Items

Core Problem

The ECPG (Embedded SQL in C for PostgreSQL) precompiler has a bug where its parser accepts syntax that its code generator cannot properly handle, resulting in invalid C code that fails at the downstream C compiler stage rather than being caught during SQL preprocessing.

The Specific Bug

The ECPG documentation and grammar define GET DESCRIPTOR with the ability to specify multiple header items:

GET DESCRIPTOR descriptor_name :cvariable = descriptor_header_item [, ... ]

When a user writes:

EXEC SQL GET DESCRIPTOR d :desc_count1 = count, :desc_count2 = count;

The ECPG precompiler accepts this syntax (the parser handles it fine), but the output/code generation phase (output_get_descr_header()) incorrectly concatenates the variable names, producing:

{ ECPGget_desc_header(__LINE__, "d", &(desc_count2desc_count1));

This generates a reference to a non-existent variable desc_count2desc_count1 — a garbled concatenation of both host variable names — which then fails at C compilation time.

Why This Matters Architecturally

This bug exposes a layering violation in ECPG's internal architecture. There is a disconnect between two components:

  1. ECPGGetDescHeaderItems — the parser rule that accepts multiple comma-separated header item assignments
  2. output_get_descr_header() — the code generation function that only handles a single header item

The parser grammar allows a list, but the code generator was never implemented to iterate over that list and emit multiple API calls or a multi-assignment form. Instead, it naively concatenates the string representations, producing invalid identifiers.

This is a classic case where grammar expressiveness exceeds backend capability, and the error surfaces at the wrong abstraction level (C compiler rather than ECPG preprocessor).

Proposed Solutions

Approach 1: Restrict the Grammar (Chosen Solution)

The submitted patch modifies the ECPG parser to reject multiple header items at the grammar level, making the precompiler emit a syntax error:

bytea.pgc:123: ERROR: syntax error at or near ","

This approach:

Approach 2: Fix the Code Generator (Not Pursued)

An alternative would be to fix output_get_descr_header() to properly handle multiple assignments — e.g., emitting multiple ECPGget_desc_header() calls or a loop construct. This was discussed but rejected because:

  1. The SQL standard does not clearly define semantics for multiple COUNT assignments in SET DESCRIPTOR
  2. Oracle Pro*C treats multiple COUNT specifications as a syntax error
  3. No user has requested this feature in the years since PG14 introduced the bug
  4. The undefined semantics (especially for SET DESCRIPTOR with multiple COUNTs) make implementation questionable

Key Technical Details

Affected Components

Scope of the Bug

Testing Considerations

The patch initially had no regression tests because "we cannot compile such a test program in the first place" — the bug manifests as a C compilation failure, which is hard to test in the standard regression framework. However, Fujii Masao identified that ECPG already has TAP tests that can detect errors/warnings from the preprocessor itself, and provided an additional patch adding such tests.

Design Decision Rationale

The decision to restrict rather than extend is well-justified:

  1. Principle of least surprise: Specifying COUNT multiple times in GET DESCRIPTOR is semantically redundant (you'd get the same value in multiple variables). While harmless, it's not useful enough to warrant implementation effort.

  2. SET DESCRIPTOR ambiguity: For SET DESCRIPTOR ... :v1 = count, :v2 = count, the semantics would be truly undefined — which value wins? This argues against supporting the syntax.

  3. Compatibility: Matching Pro*C's restriction ensures portability for users migrating embedded SQL code between systems.

  4. Minimal risk: Removing a broken feature that no one uses is safer than implementing new code generation logic.