Technical Analysis: Remove Obsolete tupDesc Assignment in Extended Statistics
Core Problem
This patch addresses a code hygiene issue in PostgreSQL's extended statistics subsystem, specifically in the lookup_var_attr_stats() function. The function currently assigns a tupDesc (tuple descriptor) to VacAttrStats entries that are created for expressions in extended statistics objects. The patch author argues this assignment is vestigial — a remnant from an earlier code state that no longer serves a functional purpose.
Technical Context
Extended Statistics Architecture
PostgreSQL's extended statistics (introduced in PG10, with expression support added in PG14) allows users to create statistics objects that capture cross-column correlations, MCV lists, and n-distinct values across multiple columns and/or expressions. The relevant code path is:
lookup_var_attr_stats()— Looks up or createsVacAttrStatsentries for variables referenced in a statistics objectmake_build_data()— Constructs the data matrix used for building statistics (MCV, ndistinct, dependencies)examine_expression()— CreatesVacAttrStatsentries specifically for expression-based statisticsstatext_mcv_build()— Builds multi-column MCV lists
The Specific Issue
In lookup_var_attr_stats(), when processing expressions (as opposed to plain column references), the code copies vacatts[0]->tupDesc into the newly created VacAttrStats entry. The rationale was presumably that downstream consumers (like statext_mcv_build()) might need a tuple descriptor to interpret values.
However, the current code path reveals this is dead/obsolete:
make_build_data()usestupDesconly for regular columns (to callheap_getattr()for extracting column values from heap tuples)- Expressions are evaluated separately through
VacAttrStatsentries created byexamine_expression(), which computes expression values via the expression evaluation machinery — no tuple descriptor needed - The old XXX comment references
statext_mcv_build()as a consumer, but this no longer appears to match the actual code flow
Why This Matters Architecturally
- Misleading code: The assignment suggests a dependency that doesn't exist, making future maintenance harder
- Fragile implicit coupling: Copying
vacatts[0]->tupDesc(the tuple descriptor of the first attribute) into an expression's stats entry is semantically wrong — expressions don't belong to any single relation's tuple format in the same way columns do - Defense against future bugs: If future code incorrectly relies on this field being set for expression entries, it would be using a tuple descriptor that doesn't actually describe the expression's output, potentially leading to subtle data corruption or crashes
Proposed Solution
The patch is minimal:
- Remove the
tupDescassignment for expression-basedVacAttrStatsentries inlookup_var_attr_stats() - Remove the associated outdated XXX comment
- The author explicitly notes that if future code needs a tuple descriptor for expression entries, it should establish that dependency explicitly rather than relying on an implicit copy of
vacatts[0]->tupDesc
Risk Assessment
This is a low-risk cleanup patch:
- If the
tupDescfield were actually read for expression entries anywhere in the current code, removing it would cause a NULL pointer dereference (easily caught in testing) - The author's analysis that
make_build_data()only usestupDescfor regular columns appears sound based on the known code structure - No behavioral change is expected for any user-visible functionality
Classification
This is a code cleanup / dead code removal patch. It does not change behavior, improve performance, or fix a user-visible bug. Its value is in improving code clarity and maintainability for the extended statistics subsystem.