ECPG: Inconsistent Behavior with GET/SET DESCRIPTOR Multiple Header Items
Core Problem
The ECPG (Embedded SQL in C for PostgreSQL) precompiler has a bug where its parser accepts syntax that its code generator cannot properly handle, resulting in invalid C code that fails at the downstream C compiler stage rather than being caught during SQL preprocessing.
The Specific Bug
The ECPG documentation and grammar define GET DESCRIPTOR with the ability to specify multiple header items:
GET DESCRIPTOR descriptor_name :cvariable = descriptor_header_item [, ... ]
When a user writes:
EXEC SQL GET DESCRIPTOR d :desc_count1 = count, :desc_count2 = count;
The ECPG precompiler accepts this syntax (the parser handles it fine), but the output/code generation phase (output_get_descr_header()) incorrectly concatenates the variable names, producing:
{ ECPGget_desc_header(__LINE__, "d", &(desc_count2desc_count1));
This generates a reference to a non-existent variable desc_count2desc_count1 — a garbled concatenation of both host variable names — which then fails at C compilation time.
Why This Matters Architecturally
This bug exposes a layering violation in ECPG's internal architecture. There is a disconnect between two components:
ECPGGetDescHeaderItems— the parser rule that accepts multiple comma-separated header item assignmentsoutput_get_descr_header()— the code generation function that only handles a single header item
The parser grammar allows a list, but the code generator was never implemented to iterate over that list and emit multiple API calls or a multi-assignment form. Instead, it naively concatenates the string representations, producing invalid identifiers.
This is a classic case where grammar expressiveness exceeds backend capability, and the error surfaces at the wrong abstraction level (C compiler rather than ECPG preprocessor).
Proposed Solutions
Approach 1: Restrict the Grammar (Chosen Solution)
The submitted patch modifies the ECPG parser to reject multiple header items at the grammar level, making the precompiler emit a syntax error:
bytea.pgc:123: ERROR: syntax error at or near ","
This approach:
- Removes the
[, ...]repetition from the grammar rules for both GET and SET DESCRIPTOR header items - Updates documentation to match the restricted syntax
- Ensures errors are caught at the correct stage (preprocessing, not compilation)
- Aligns with Oracle Pro*C behavior, which also restricts COUNT to a single specification
Approach 2: Fix the Code Generator (Not Pursued)
An alternative would be to fix output_get_descr_header() to properly handle multiple assignments — e.g., emitting multiple ECPGget_desc_header() calls or a loop construct. This was discussed but rejected because:
- The SQL standard does not clearly define semantics for multiple COUNT assignments in SET DESCRIPTOR
- Oracle Pro*C treats multiple COUNT specifications as a syntax error
- No user has requested this feature in the years since PG14 introduced the bug
- The undefined semantics (especially for SET DESCRIPTOR with multiple COUNTs) make implementation questionable
Key Technical Details
Affected Components
src/interfaces/ecpg/preproc/— the ECPG grammar and output functions- Specifically: the grammar rules for descriptor header items and the
output_get_descr_header()function
Scope of the Bug
- Affects PostgreSQL versions 14 through 18
- Impacts both GET DESCRIPTOR and SET DESCRIPTOR statements
- Only affects the header-level items (COUNT), not the VALUE-level items (which have a different code path)
Testing Considerations
The patch initially had no regression tests because "we cannot compile such a test program in the first place" — the bug manifests as a C compilation failure, which is hard to test in the standard regression framework. However, Fujii Masao identified that ECPG already has TAP tests that can detect errors/warnings from the preprocessor itself, and provided an additional patch adding such tests.
Design Decision Rationale
The decision to restrict rather than extend is well-justified:
-
Principle of least surprise: Specifying
COUNTmultiple times in GET DESCRIPTOR is semantically redundant (you'd get the same value in multiple variables). While harmless, it's not useful enough to warrant implementation effort. -
SET DESCRIPTOR ambiguity: For
SET DESCRIPTOR ... :v1 = count, :v2 = count, the semantics would be truly undefined — which value wins? This argues against supporting the syntax. -
Compatibility: Matching Pro*C's restriction ensures portability for users migrating embedded SQL code between systems.
-
Minimal risk: Removing a broken feature that no one uses is safer than implementing new code generation logic.