SQL Property Graph Queries (SQL/PGQ)

First seen: 2024-02-16 14:53:11+00:00 · Messages: 211 · Participants: 19

Latest Update

2026-05-18 · claude-opus-4-6

SQL Property Graph Queries (SQL/PGQ) - Deep Technical Analysis

Problem Statement

This thread tracks the implementation of SQL Property Graph Queries (SQL/PGQ) as specified in SQL:2023 (ISO/IEC 9075-16). The goal is to allow PostgreSQL users to define property graphs over existing relational tables and query them using graph pattern matching syntax via GRAPH_TABLE(), without introducing a separate graph storage engine.

The core architectural insight is that SQL/PGQ is designed as syntactic sugar over relational structures — property graphs are metadata overlays on existing tables, and graph pattern matching queries are rewritten into standard SQL joins and unions. This aligns with PostgreSQL's existing rewriter-based view expansion model.

Architectural Design

Storage Model

Property graphs are stored as catalog metadata (new pg_propgraph_* system catalogs) that map:

No new storage engine or index type is introduced. The RELKIND_PROPGRAPH relation kind is added to pg_class but carries no tuple data.

Query Processing Pipeline

  1. Parser (parse_graphtable.c): Transforms GRAPH_TABLE(... MATCH ... COLUMNS ...) syntax into an RTE_GRAPH_TABLE range table entry containing GraphElementPattern nodes
  2. Rewriter (rewriteGraphTable.c): Expands RTE_GRAPH_TABLE into a lateral subquery containing joins (edge-vertex link quals) and unions (when labels span multiple tables)
  3. Planner/Executor: Processes the resulting standard relational query with no graph-specific nodes

Key Design Decision: Rewriter vs. Planner

Andres Freund raised early concern about using the rewrite system, arguing it bars the planner from making graph-aware optimization decisions. Peter Eisentraut defended the approach:

Tomas Vondra and Ashutosh Bapat noted that while this approach may lose semantic information useful for specialized optimizations (custom index types, specialized executor nodes), those optimizations can be added later without changing the fundamental architecture. The annotations about graph structure could be preserved for future planner enhancements.

Security Model

The implementation adopts security_invoker semantics exclusively:

This differs from the SQL standard's apparent intent (which seems closer to security_definer — once you have SELECT on the graph, no further table privileges needed), but is the safer default for PostgreSQL's security model.

Key Technical Challenges Resolved

1. Parser Hook Conflicts (Crash in PL/pgSQL)

The initial implementation used parser hooks (p_pre_columnref_hook, p_post_columnref_hook) for resolving graph property references. This crashed when GRAPH_TABLE was used inside PL/pgSQL functions because PL/pgSQL uses the same hooks. Fix: Replace hooks with a p_graph_table_pstate member in ParseState.

2. Lock Acquisition for Graph Tables

Property graph relations referenced in GRAPH_TABLE were not being locked during AcquireRewriteLocks(), causing assertion failures after commit 525392d57 added lock verification. Fix: Handle RTE_GRAPH_TABLE in AcquireRewriteLocks() similar to RTE_RELATION.

3. Edge-Vertex Link Quals

Constructing the join conditions between edges and their adjacent vertices requires:

4. Any-Direction Edge Matching (-[]-)

Initially only right-directed (-[]->) and left-directed (<-[]-) edges worked. Any-direction matching requires generating OR conditions for both directions when source and destination come from the same table.

5. Cyclic Path Patterns

Patterns like (a)->(b)->(a) where element variables repeat require special handling — the repeated variable constrains the path to start and end at the same vertex. A path_factor abstraction combines all GraphElementPattern nodes sharing a variable.

6. Implicit Vertex Patterns (Post-commit)

Patterns like ()-> (trailing edge without following vertex) caused crashes. Per SQL/PGQ standard feature G047, implicit empty vertex patterns should be inserted. The implementation adds them during transformation with an implicit flag so ruleutils can reconstruct the original syntax.

7. Non-local Element Variable References

The standard's feature G041 allows element WHERE clauses to reference variables from other elements. The initial implementation only passed list_make1(pe) (current element's mapping) to property reference resolution, causing crashes on cross-references. After discussion, the decision was to reject all non-local references (neither forward nor backward) until G041 is properly implemented, to avoid half-implementations and future backward compatibility issues.

8. Property Graph Opening (parserOpenPropGraph)

Rather than adding RELKIND_PROPGRAPH to table_open()'s accepted relkinds (which would affect all table-opening code), a separate parserOpenPropGraph() function was created — cleaner separation that provides better error messages.

Catalog Invalidation

Initial implementation hardcoded PGQ catalogs into src/backend/utils/cache/inval.c (alongside fundamental catalogs like pg_class and pg_index). Peter flagged this as inappropriate. The resolution was to call CacheInvalidateRelcacheByRelid() with the property graph's OID at appropriate places in AlterPropGraph(), since CREATE/DROP already touch pg_class and trigger normal invalidation.

Features Explicitly Deferred

Compatibility Assessment (vs Oracle)

Per Henson Choi's cross-verification:

Code Quality Metrics