SQL Property Graph Queries (SQL/PGQ) - Deep Technical Analysis
Problem Statement
This thread tracks the implementation of SQL Property Graph Queries (SQL/PGQ) as specified in SQL:2023 (ISO/IEC 9075-16). The goal is to allow PostgreSQL users to define property graphs over existing relational tables and query them using graph pattern matching syntax via GRAPH_TABLE(), without introducing a separate graph storage engine.
The core architectural insight is that SQL/PGQ is designed as syntactic sugar over relational structures — property graphs are metadata overlays on existing tables, and graph pattern matching queries are rewritten into standard SQL joins and unions. This aligns with PostgreSQL's existing rewriter-based view expansion model.
Architectural Design
Storage Model
Property graphs are stored as catalog metadata (new pg_propgraph_* system catalogs) that map:
- Vertex tables → existing relations with a designated key
- Edge tables → existing relations with source/destination foreign key references to vertex tables
- Labels → logical groupings that can span multiple element tables
- Properties → columns of the underlying tables exposed through the graph schema
No new storage engine or index type is introduced. The RELKIND_PROPGRAPH relation kind is added to pg_class but carries no tuple data.
Query Processing Pipeline
- Parser (
parse_graphtable.c): TransformsGRAPH_TABLE(... MATCH ... COLUMNS ...)syntax into anRTE_GRAPH_TABLErange table entry containingGraphElementPatternnodes - Rewriter (
rewriteGraphTable.c): ExpandsRTE_GRAPH_TABLEinto a lateral subquery containing joins (edge-vertex link quals) and unions (when labels span multiple tables) - Planner/Executor: Processes the resulting standard relational query with no graph-specific nodes
Key Design Decision: Rewriter vs. Planner
Andres Freund raised early concern about using the rewrite system, arguing it bars the planner from making graph-aware optimization decisions. Peter Eisentraut defended the approach:
- PGQ was intentionally designed in the standard to work this way — as view-like expansion to joins/unions
- Security infrastructure (privileges, security barriers, RLS) is already implemented in the rewriter path
- Aligning with view expansion allows reuse of existing security mechanisms
Tomas Vondra and Ashutosh Bapat noted that while this approach may lose semantic information useful for specialized optimizations (custom index types, specialized executor nodes), those optimizations can be added later without changing the fundamental architecture. The annotations about graph structure could be preserved for future planner enhancements.
Security Model
The implementation adopts security_invoker semantics exclusively:
- Users need
SELECTprivilege on the property graph to reference it inGRAPH_TABLE() - Access to underlying element tables uses the current user's privileges, not the graph owner's
- RLS policies on element tables are enforced transparently
- Security_definer semantics are explicitly avoided because property graphs cannot restrict which functions/operators are used in
GRAPH_TABLE(), making privilege escalation trivially possible
This differs from the SQL standard's apparent intent (which seems closer to security_definer — once you have SELECT on the graph, no further table privileges needed), but is the safer default for PostgreSQL's security model.
Key Technical Challenges Resolved
1. Parser Hook Conflicts (Crash in PL/pgSQL)
The initial implementation used parser hooks (p_pre_columnref_hook, p_post_columnref_hook) for resolving graph property references. This crashed when GRAPH_TABLE was used inside PL/pgSQL functions because PL/pgSQL uses the same hooks. Fix: Replace hooks with a p_graph_table_pstate member in ParseState.
2. Lock Acquisition for Graph Tables
Property graph relations referenced in GRAPH_TABLE were not being locked during AcquireRewriteLocks(), causing assertion failures after commit 525392d57 added lock verification. Fix: Handle RTE_GRAPH_TABLE in AcquireRewriteLocks() similar to RTE_RELATION.
3. Edge-Vertex Link Quals
Constructing the join conditions between edges and their adjacent vertices requires:
- Finding the appropriate equality operator for key matching
- Storing the operator OID in
pg_propgraph_elementcatalog - Properly handling collation and typmod propagation
4. Any-Direction Edge Matching (-[]-)
Initially only right-directed (-[]->) and left-directed (<-[]-) edges worked. Any-direction matching requires generating OR conditions for both directions when source and destination come from the same table.
5. Cyclic Path Patterns
Patterns like (a)->(b)->(a) where element variables repeat require special handling — the repeated variable constrains the path to start and end at the same vertex. A path_factor abstraction combines all GraphElementPattern nodes sharing a variable.
6. Implicit Vertex Patterns (Post-commit)
Patterns like ()-> (trailing edge without following vertex) caused crashes. Per SQL/PGQ standard feature G047, implicit empty vertex patterns should be inserted. The implementation adds them during transformation with an implicit flag so ruleutils can reconstruct the original syntax.
7. Non-local Element Variable References
The standard's feature G041 allows element WHERE clauses to reference variables from other elements. The initial implementation only passed list_make1(pe) (current element's mapping) to property reference resolution, causing crashes on cross-references. After discussion, the decision was to reject all non-local references (neither forward nor backward) until G041 is properly implemented, to avoid half-implementations and future backward compatibility issues.
8. Property Graph Opening (parserOpenPropGraph)
Rather than adding RELKIND_PROPGRAPH to table_open()'s accepted relkinds (which would affect all table-opening code), a separate parserOpenPropGraph() function was created — cleaner separation that provides better error messages.
Catalog Invalidation
Initial implementation hardcoded PGQ catalogs into src/backend/utils/cache/inval.c (alongside fundamental catalogs like pg_class and pg_index). Peter flagged this as inappropriate. The resolution was to call CacheInvalidateRelcacheByRelid() with the property graph's OID at appropriate places in AlterPropGraph(), since CREATE/DROP already touch pg_class and trigger normal invalidation.
Features Explicitly Deferred
- Quantified path patterns (
-[*1..3]->) — requires WITH RECURSIVE rewrite or custom executor node - Shortest path — requires bidirectional BFS executor node for production use
- Label conjunction (
a IS Person & Employee) - Security_definer property graphs
<-[]->(full edge left or right) — feature G043- Implicit vertex patterns — feature G047
- Non-local element variable references — feature G041
- All properties reference (
a.*in COLUMNS) — partially implemented in follow-up patch
Compatibility Assessment (vs Oracle)
Per Henson Choi's cross-verification:
- PG advantages: ALTER PROPERTY GRAPH, TEMPORARY graphs, LATERAL + GRAPH_TABLE, NODE/RELATIONSHIP synonyms, static GRAPH_TABLE in PL/pgSQL
- Oracle advantages:
a.*in COLUMNS, broader query feature coverage (quantifiers, shortest path) - Semantic difference: PG allows label sharing across tables; Oracle restricts labels to one table
Code Quality Metrics
- ~91.4% line coverage across PGQ-specific files
- No memory leaks detected via Valgrind
- Clean separation: most new code in dedicated files (
propgraphcmds.c,rewriteGraphTable.c,parse_graphtable.c) - Minimal intersection with existing critical code paths