Fix Incorrect Size Check in statext_dependencies_deserialize
Core Problem
The function statext_dependencies_deserialize() in PostgreSQL's extended statistics subsystem contains a bug in its sanity check that validates the size of incoming serialized bytea data before deserialization. The issue is a semantic mismatch between what the validation macro expects and what it's being given.
Technical Details
PostgreSQL's extended statistics system supports functional dependencies (CREATE STATISTICS ... (dependencies)), which track probabilistic relationships between columns. These statistics are serialized into the pg_statistic_ext_data catalog and deserialized when needed by the planner.
During deserialization in statext_dependencies_deserialize():
- The code reads
ndeps(the number of dependency entries) from the serialized header. - It then performs a sanity check to ensure the bytea is at least large enough to contain the claimed data.
- The bug: The check uses
SizeOfItem(ndeps), which computes the size of a single dependency item that hasndepsattributes. This is semantically wrong —ndepshere represents the count of dependency entries, not the number of attributes in one entry. - The fix: It should use
MinSizeOfItems(ndeps), which correctly computesheader_size + ndeps * minimum_single_item_size— i.e., the minimum total size needed to holdndepsminimally-sized dependency items (each with the minimum number of attributes, which is 2).
Why This Matters Architecturally
The sanity check exists to protect against catalog corruption or invalid data causing out-of-bounds memory reads during deserialization. With the incorrect check:
- False negatives: The check could pass for corrupted data that is actually too small. For example, if
ndepsis small (say 2),SizeOfItem(2)computes the size of one dependency with 2 attributes, which may be less than the actual minimum size needed for 2 separate dependency entries each with their own headers. - False positives (less likely but possible): For large
ndepsvalues,SizeOfItem(ndeps)could compute a value larger than necessary (since it's treatingndepsas a single item's attribute count), potentially rejecting valid data — though this scenario is unlikely in practice since realisticndepsvalues tend to be small.
The practical impact is limited because:
- The data originates from PostgreSQL's own serialization code under normal circumstances.
- The subsequent per-item deserialization loop has its own bounds (it reads
nattributesper item and validates individually).
However, the check is defense-in-depth against catalog corruption, and having it be incorrect defeats its purpose.
Proposed Solution
The patch is minimal and surgical:
- Replace
SizeOfItem(ndeps)withMinSizeOfItems(ndeps)in the size validation check withinstatext_dependencies_deserialize().
This aligns the function's behavior with statext_ndistinct_deserialize(), which already correctly uses MinSizeOfItems for its analogous check. The author correctly identifies this as a copy-paste or typo error rather than an intentional design choice, supported by the inconsistency with the ndistinct code path.
Consistency Argument
The extended statistics subsystem has parallel serialization/deserialization paths for:
- ndistinct statistics (
statext_ndistinct_serialize/deserialize) - dependencies statistics (
statext_dependencies_serialize/deserialize) - MCV lists (separate code path)
The ndistinct path already uses MinSizeOfItems correctly, making this clearly a bug in the dependencies path rather than a deliberate architectural choice.
Risk Assessment
This is an extremely low-risk fix:
- It corrects a validation check to be more accurate (not less)
- It does not change any data format or serialization logic
- It only affects the early-exit error path for malformed data
- Normal operation with valid catalog data is unaffected