Analysis: DuckDB's Extensible Parser as Inspiration for PostgreSQL
Core Problem
PostgreSQL's SQL parser is monolithic and tightly coupled to the server's grammar definition (gram.y). This makes it extremely difficult to:
- Add new syntax without modifying core PostgreSQL source code
- Support domain-specific languages or SQL dialects within PostgreSQL
- Allow extensions to introduce new statement types that are first-class citizens in the parser
The parser is one of the least extensible parts of PostgreSQL's architecture. While PostgreSQL has rich extension APIs for functions, operators, types, access methods, and even custom scan providers, the parser remains a hard boundary — extensions cannot introduce new grammar productions at runtime.
Context: DuckDB's Approach
The referenced article (DuckDB blog, November 2024) describes DuckDB's implementation of a runtime-extensible parser. DuckDB's approach allows extensions to:
- Register custom parser handlers that intercept statements the core parser cannot handle
- Introduce entirely new SQL statement types without modifying DuckDB's core grammar
- Fall back gracefully to the core parser when extensions don't claim a statement
This is architecturally significant because it demonstrates a practical implementation of parser extensibility in a production analytical database system.
Historical PostgreSQL Context
This topic has surfaced multiple times in pgsql-hackers history:
- Hooks in the parser: PostgreSQL does have
post_parse_analyze_hookwhich allows extensions to inspect/modify the parse tree after parsing, but this doesn't help with introducing new syntax - Raw parser hooks: There have been past discussions about adding a hook that could intercept raw input before or during parsing, allowing extensions to handle unrecognized statements
- PL languages: PostgreSQL's procedural languages effectively have their own parsers, but these operate within the
DO/function body context, not at the top-level SQL statement level pg_query: The external libpg_query project extracts PostgreSQL's parser for use outside the server, but doesn't address extensibility
Technical Tradeoffs
The fundamental tension for PostgreSQL is:
-
Security and correctness: A monolithic, well-tested parser provides strong guarantees about what SQL is accepted. Extensible parsing introduces risk of ambiguous grammars or security bypasses.
-
Performance: PostgreSQL's parser is generated by Bison at compile time, producing efficient LALR(1) parsing tables. Runtime extensibility could introduce overhead at parse time for every query.
-
Compatibility: If extensions can introduce arbitrary syntax, it becomes harder to reason about SQL compatibility and portability.
-
Practical need: Many use cases (e.g.,
CREATE EXTENSION-specific DDL, graph query languages like openCypher, compatibility shims for other databases) would benefit enormously from parser extensibility.
Assessment
This is a very brief, link-sharing post rather than a formal proposal or patch submission. Pavel Stehule is pointing the community toward DuckDB's implementation as a reference design for a long-discussed capability. The thread generated no responses, suggesting either the community considers this a known-but-intractable problem, or the post didn't provide enough concrete proposal material to spark discussion.
For PostgreSQL to adopt something similar, a concrete proposal would need to address:
- Where in the parsing pipeline the hook would be inserted (before Bison, as a fallback, or as a pre-processor)
- How to handle grammar conflicts between extensions
- How extended parse nodes would flow through rewriting, planning, and execution
- Security implications of arbitrary parser extensions
Relevance to PostgreSQL Architecture
The parser extensibility question is deeply connected to PostgreSQL's extension ecosystem maturity. As the extension ecosystem grows (with projects like Citus, TimescaleDB, pgvector, AGE/graph, etc. all wanting custom syntax), the pressure to make the parser extensible increases. DuckDB's approach provides a concrete reference implementation that PostgreSQL developers can evaluate.