Fix Small Issues of pg_restore_extended_stats()
Technical Context
pg_restore_extended_stats() is part of PostgreSQL's statistics import/export infrastructure, which allows pg_dump/pg_restore to preserve extended statistics across dump-and-reload cycles. Extended statistics (created via CREATE STATISTICS) track multi-column correlations, n-distinct values, and expression statistics that the planner uses for cardinality estimation. The ability to inject these statistics programmatically is also a supported use-case for users who want to manually set optimizer hints without waiting for ANALYZE.
This thread identifies three bugs in the implementation, two of which have subtle correctness implications for users relying on the statistics injection API.
Core Problem #1: Prefix-Matching Bug in Key Validation
The most architecturally significant bug is in key_in_expr_argnames(), which validates JSON keys provided in the exprs argument. The function uses strncmp() with the input key's length as the comparison bound:
if (strncmp(extexprargname[i], key->val.string.val, key->val.string.len) == 0)
return true;
This means any input string that is a prefix of a valid key name passes validation. For example:
"a"matches"avg_width"(prefix match)"correlatio"matches"correlation"(off-by-one typo)"n"would match"n_distinct"or"null_frac"
The downstream effect is nuanced: the invalid key doesn't actually get used (later filtering catches the mismatch), but the function returns true (success) without any warning. This creates a silent data loss scenario — a user sets what they believe is "correlation" but misspells it as "correlatio", gets no error, and the correlation value is simply never imported. The planner then operates without the intended statistical hint, potentially producing suboptimal plans with no diagnostic trail.
Fix
The fix is straightforward: add a length comparison alongside strncmp(). The valid key's length must equal key->val.string.len for a true match. This is a common pattern when comparing JsonbValue strings (which are not null-terminated) against C strings.
Core Problem #2: Wrong Variable in Error Message
When the number of JSON array elements in exprs doesn't match the number of expressions in the statistics object, the error message incorrectly reports num_root_elements (the count the user provided) instead of numexprs (the count that is required):
errmsg("could not parse \"%s\": incorrect number of elements (%d required)",
argname, num_root_elements) /* should be numexprs */
This is a simple variable swap bug. The message "3 required" when only 1 is required actively misleads the user about what correction is needed.
Core Problem #3: Memory Leak on Error Paths
In pg_clear_extended_stats(), a heap tuple fetched via catalog lookup is freed with heap_freetuple(tup) only on the success path. Two early-return error paths (after warnings) skip this cleanup. While PostgreSQL's memory context system will eventually reclaim this memory when the current context is reset, the inconsistency with the explicit free at the bottom suggests the intent was to always clean up. Michael Paquier confirms the !IsValid() case has nothing to free (the lookup failed, so no tuple was allocated), but the stxrelid mismatch case does hold a valid tuple that should be freed.
Design Implications
These bugs highlight a recurring challenge in PostgreSQL's C codebase: manual string comparison with JsonbValue types. The jbvString representation stores length separately from the character data (strings are not null-terminated in jsonb's internal format), making naive use of strncmp() a footgun. A more robust pattern would be a helper macro or function that always checks both content and length equality.
The statistics injection API is relatively new (extended stats import was added in the PG17/18 timeframe), and these edge cases reflect the difference between the pg_dump code path (which always generates correct keys) and the user-facing API (which must handle arbitrary input defensively). Michael Paquier's comment about "stats injection is a supported use-case" confirms the project's commitment to making this API robust beyond just the pg_dump pathway.