Technical Analysis: Misleading Error Message in ProcessUtilitySlow T_CreateStatsStmt
Core Problem
This thread begins with a seemingly simple observation about a misleading error message but evolves into a significant architectural refactoring discussion about how CREATE STATISTICS is processed in PostgreSQL's utility command pipeline.
The Surface Issue
When a user writes:
CREATE STATISTICS alt_stat2 ON a, b FROM tftest(1);
where tftest is a table-returning function, the error message returned is:
ERROR: only a single relation is allowed in CREATE STATISTICS
This is misleading because:
- The user is providing a single relation (just the wrong kind)
- The actual problem is that a table function isn't a plain table name — it's not about cardinality of relations
The Deeper Architectural Issues
Upon investigation, multiple deeper problems were identified:
-
Redundant relation resolution: The relation name is resolved (via
RangeVarGetRelid) twice — once inProcessUtilitySlow()and again insideCreateStatistics(). This is both wasteful and potentially dangerous (CVE-2014-0062 pattern: resolving the same non-fully-qualified name twice might yield different results due to concurrent DDL). -
ProcessUtilitySlow doing too much work: The
case T_CreateStatsStmtblock inProcessUtilitySlowperforms parse analysis (transformStatsStmt) and relation opening, which violates the design intent of ProcessUtilitySlow as merely a dispatching switch. -
Double locking: Both
ProcessUtilitySlowandCreateStatisticsacquireShareUpdateExclusiveLockon the same relation, which is redundant if we restructure the code to open the relation only once. -
The error check itself is mischaracterized: As Peter Eisentraut pointed out, the
!IsA(rel, RangeVar)check doesn't examine relation kind (relkind) at all — it checks whether the FROM clause entry is a table name vs. some other grammar production (VALUES, JOIN, table function, XMLTABLE, etc.). The error message should reflect this distinction.
Proposed Solutions
1. Error Message Fix (Committed by Álvaro)
The immediate fix changed the error message to better reflect what the check actually validates:
ERROR: cannot create statistics on specified relation
DETAIL: CREATE STATISTICS only supports tables, materialized views, foreign tables, and partitioned tables.
Tom Lane argued against listing "partitioned tables" separately since they're generally subsumed under "tables." This was accepted.
2. Peter Eisentraut's Grammar-Level Fix (Committed)
Peter proposed that the error could be eliminated entirely by tightening the grammar from:
FROM from_list
to:
FROM qualified_name_list
This would reject non-table-name entries at parse time rather than requiring a runtime check. He also proposed a wording change making the error about "table names in the FROM clause." Tom Lane agreed on the wording but insisted on keeping ERRCODE_FEATURE_NOT_SUPPORTED rather than a syntax error code, since multi-relation statistics is intentionally left as syntax space for future features.
3. The Refactoring Patch (Under Discussion)
The larger refactoring (iterated through v1–v5+ by jian he, with significant v4 input from Álvaro) restructures the pipeline:
- Move parse analysis into
CreateStatistics(): Instead ofProcessUtilitySlowcallingtransformStatsStmtand thenCreateStatistics, the transform is done insideCreateStatisticsitself. - Open relation only once: The relation is opened with
ShareUpdateExclusiveLockinCreateStatisticsand theRelationobject is passed directly totransformStatsStmt. - Simplify
ATPostAlterTypeParse(): SinceCreateStatistics()now handles transformation internally, the special-case code inATPostAlterTypeParsethat callstransformStatsStmtseparately becomes unnecessary.
Tradeoff: Error Position Reporting
jian he identified that the refactoring loses error position information in one edge case:
ALTER TABLE t ALTER COLUMN a SET DATA TYPE text;
When this triggers re-validation of a statistics expression like (a + 1 IS NOT NULL), the current code reports the error position in the ALTER TABLE statement. After refactoring, the position is lost. Álvaro argued this is acceptable because the position was misleading anyway — it points to a location in the ALTER TABLE statement that has nothing to do with the operator error.
4. Future Design Question: Pre/Post Transform Nodes
Álvaro's latest message (May 2026) raises whether CreateStatsStmt should be split into two separate node types: one for the pre-transform state (raw parser output) and one for the post-transform state. This is a pattern used elsewhere in PostgreSQL (e.g., RawStmt vs. planned statements) and would make the data flow clearer, avoiding the stmt->transformed flag pattern.
Key Technical Insights
The !IsA(rel, RangeVar) Check
The grammar for CREATE STATISTICS ... FROM from_list accepts the full from_list production, which can include JOINs, subqueries, VALUES, XMLTABLE, and table functions. The IsA(rel, RangeVar) check is a runtime guard that rejects everything except plain table names. This was intentionally designed to leave grammar space for future multi-relation statistics support.
Lock Semantics
When ProcessUtilitySlow resolves the name to an OID and acquires ShareUpdateExclusiveLock, and then CreateStatistics does relation_open(relid, ShareUpdateExclusiveLock) again, the second call is a no-op (same lock level already held). Tom Lane pointed out that the second call should use NoLock to make explicit that we expect the lock to already be held. The refactoring eliminates this issue entirely.
The CVE-2014-0062 Pattern
Resolving a name to OID twice without holding a lock continuously between the two resolutions can be exploited: an attacker could rename objects between the two resolutions to cause the second resolution to target a different object. The refactoring closes this (theoretical) window by resolving once and propagating the OID/Relation forward.
Multi-Relation Statistics (Hypothetical Future)
The thread touches on how multi-relation extended statistics might work. jian he argues that pg_statistic_ext would need to store all associated relation OIDs (not just one stxrelid) for expression deparsing to work. This is relevant because the grammar intentionally accepts from_list (not just a single table name) in anticipation of this feature.
Glossary Improvement (Side Topic)
The thread also led to an improvement of the "relation" entry in the PostgreSQL glossary. Tom Lane proposed clearer wording distinguishing the mathematical meaning ("a set of tuples" — the origin of "relational database") from the PostgreSQL-specific meaning ("an SQL object with a pg_class entry"). Álvaro committed this improvement.