2026-05-22 · claude-opus-4-6
Incremental Update: Patch Accepted by Committer
Michael Paquier (committer) has acknowledged the bug and committed to fixing it. He confirms the root cause as a typo originating from commit d08c44f7a4ec, which is the commit that originally introduced the dependencies deserialization code. He validates the fix approach, noting that MinSizeOfItems correctly aligns with the MVDependency struct definition in statistics.h.
No new technical arguments or alternative approaches were raised. This is a straightforward acceptance and commitment to apply the fix.
2026-05-20 · claude-opus-4-6
Fix Incorrect Size Check in statext_dependencies_deserialize
Core Problem
The function statext_dependencies_deserialize() in PostgreSQL's extended statistics subsystem contains a bug in its sanity check that validates the size of incoming serialized bytea data before deserialization. The issue is a semantic mismatch between what the validation macro expects and what it's being given.
Technical Details
PostgreSQL's extended statistics system supports functional dependencies (CREATE STATISTICS ... (dependencies)), which track probabilistic relationships between columns. These statistics are serialized into the pg_statistic_ext_data catalog and deserialized when needed by the planner.
During deserialization in statext_dependencies_deserialize():
- The code reads
ndeps (the number of dependency entries) from the serialized header.
- It then performs a sanity check to ensure the bytea is at least large enough to contain the claimed data.
- The bug: The check uses
SizeOfItem(ndeps), which computes the size of a single dependency item that has ndeps attributes. This is semantically wrong — ndeps here represents the count of dependency entries, not the number of attributes in one entry.
- The fix: It should use
MinSizeOfItems(ndeps), which correctly computes header_size + ndeps * minimum_single_item_size — i.e., the minimum total size needed to hold ndeps minimally-sized dependency items (each with the minimum number of attributes, which is 2).
Why This Matters Architecturally
The sanity check exists to protect against catalog corruption or invalid data causing out-of-bounds memory reads during deserialization. With the incorrect check:
- False negatives: The check could pass for corrupted data that is actually too small. For example, if
ndeps is small (say 2), SizeOfItem(2) computes the size of one dependency with 2 attributes, which may be less than the actual minimum size needed for 2 separate dependency entries each with their own headers.
- False positives (less likely but possible): For large
ndeps values, SizeOfItem(ndeps) could compute a value larger than necessary (since it's treating ndeps as a single item's attribute count), potentially rejecting valid data — though this scenario is unlikely in practice since realistic ndeps values tend to be small.
The practical impact is limited because:
- The data originates from PostgreSQL's own serialization code under normal circumstances.
- The subsequent per-item deserialization loop has its own bounds (it reads
nattributes per item and validates individually).
However, the check is defense-in-depth against catalog corruption, and having it be incorrect defeats its purpose.
Proposed Solution
The patch is minimal and surgical:
- Replace
SizeOfItem(ndeps) with MinSizeOfItems(ndeps) in the size validation check within statext_dependencies_deserialize().
This aligns the function's behavior with statext_ndistinct_deserialize(), which already correctly uses MinSizeOfItems for its analogous check. The author correctly identifies this as a copy-paste or typo error rather than an intentional design choice, supported by the inconsistency with the ndistinct code path.
Consistency Argument
The extended statistics subsystem has parallel serialization/deserialization paths for:
- ndistinct statistics (
statext_ndistinct_serialize/deserialize)
- dependencies statistics (
statext_dependencies_serialize/deserialize)
- MCV lists (separate code path)
The ndistinct path already uses MinSizeOfItems correctly, making this clearly a bug in the dependencies path rather than a deliberate architectural choice.
Risk Assessment
This is an extremely low-risk fix:
- It corrects a validation check to be more accurate (not less)
- It does not change any data format or serialization logic
- It only affects the early-exit error path for malformed data
- Normal operation with valid catalog data is unaffected