Fix incorrect size check in statext_dependencies_deserialize

First seen: 2026-05-19 14:29:56+00:00 · Messages: 3 · Participants: 2

Latest Update

2026-05-27 · claude-opus-4-6

Patch Committed

Michael Paquier has committed the fix as 0b8fa5fd37b1. This is a brief confirmation message with no new technical discussion, alternative approaches, or additional context beyond what was already covered in the prior analysis.

No substantive new technical content — this is simply the commit notification closing out the thread.

History (2 prior analyses)
2026-05-22 · claude-opus-4-6

Incremental Update: Patch Accepted by Committer

Michael Paquier (committer) has acknowledged the bug and committed to fixing it. He confirms the root cause as a typo originating from commit d08c44f7a4ec, which is the commit that originally introduced the dependencies deserialization code. He validates the fix approach, noting that MinSizeOfItems correctly aligns with the MVDependency struct definition in statistics.h.

No new technical arguments or alternative approaches were raised. This is a straightforward acceptance and commitment to apply the fix.


2026-05-20 · claude-opus-4-6

Fix Incorrect Size Check in statext_dependencies_deserialize

Core Problem

The function statext_dependencies_deserialize() in PostgreSQL's extended statistics subsystem contains a bug in its sanity check that validates the size of incoming serialized bytea data before deserialization. The issue is a semantic mismatch between what the validation macro expects and what it's being given.

Technical Details

PostgreSQL's extended statistics system supports functional dependencies (CREATE STATISTICS ... (dependencies)), which track probabilistic relationships between columns. These statistics are serialized into the pg_statistic_ext_data catalog and deserialized when needed by the planner.

During deserialization in statext_dependencies_deserialize():

  1. The code reads ndeps (the number of dependency entries) from the serialized header.
  2. It then performs a sanity check to ensure the bytea is at least large enough to contain the claimed data.
  3. The bug: The check uses SizeOfItem(ndeps), which computes the size of a single dependency item that has ndeps attributes. This is semantically wrong — ndeps here represents the count of dependency entries, not the number of attributes in one entry.
  4. The fix: It should use MinSizeOfItems(ndeps), which correctly computes header_size + ndeps * minimum_single_item_size — i.e., the minimum total size needed to hold ndeps minimally-sized dependency items (each with the minimum number of attributes, which is 2).

Why This Matters Architecturally

The sanity check exists to protect against catalog corruption or invalid data causing out-of-bounds memory reads during deserialization. With the incorrect check:

  • False negatives: The check could pass for corrupted data that is actually too small. For example, if ndeps is small (say 2), SizeOfItem(2) computes the size of one dependency with 2 attributes, which may be less than the actual minimum size needed for 2 separate dependency entries each with their own headers.
  • False positives (less likely but possible): For large ndeps values, SizeOfItem(ndeps) could compute a value larger than necessary (since it's treating ndeps as a single item's attribute count), potentially rejecting valid data — though this scenario is unlikely in practice since realistic ndeps values tend to be small.

The practical impact is limited because:

  1. The data originates from PostgreSQL's own serialization code under normal circumstances.
  2. The subsequent per-item deserialization loop has its own bounds (it reads nattributes per item and validates individually).

However, the check is defense-in-depth against catalog corruption, and having it be incorrect defeats its purpose.

Proposed Solution

The patch is minimal and surgical:

  • Replace SizeOfItem(ndeps) with MinSizeOfItems(ndeps) in the size validation check within statext_dependencies_deserialize().

This aligns the function's behavior with statext_ndistinct_deserialize(), which already correctly uses MinSizeOfItems for its analogous check. The author correctly identifies this as a copy-paste or typo error rather than an intentional design choice, supported by the inconsistency with the ndistinct code path.

Consistency Argument

The extended statistics subsystem has parallel serialization/deserialization paths for:

  • ndistinct statistics (statext_ndistinct_serialize/deserialize)
  • dependencies statistics (statext_dependencies_serialize/deserialize)
  • MCV lists (separate code path)

The ndistinct path already uses MinSizeOfItems correctly, making this clearly a bug in the dependencies path rather than a deliberate architectural choice.

Risk Assessment

This is an extremely low-risk fix:

  • It corrects a validation check to be more accurate (not less)
  • It does not change any data format or serialization logic
  • It only affects the early-exit error path for malformed data
  • Normal operation with valid catalog data is unaffected