amcheck support for BRIN indexes

First seen: 2025-04-22 09:05:46+00:00 · Messages: 25 · Participants: 5

Latest Update

2026-05-22 · claude-opus-4-6

amcheck Support for BRIN Indexes: Deep Technical Analysis

Core Problem

PostgreSQL's amcheck extension provides corruption-detection capabilities for B-tree indexes (and more recently GIN indexes), but BRIN (Block Range INdex) indexes have no equivalent validation tooling. BRIN indexes are structurally very different from B-tree indexes — they store summary data (min/max values, inclusion ranges, bloom filters) for ranges of heap pages rather than individual tuple pointers. This structural difference means BRIN corruption can manifest in unique ways: corrupted revmap entries, orphaned index tuples, inconsistent range summaries, or summary data that doesn't actually cover the heap tuples it claims to represent.

Without amcheck support, BRIN corruption can go undetected until it causes wrong query results (silent data loss from a user perspective), making this a significant operational gap for production systems that rely on BRIN indexes for large tables.

Architecture of the Proposed Solution

The patch introduces brin_index_check() with two major verification phases:

Phase 1: Index Structure Check

This validates the internal consistency of the BRIN index without touching the heap:

  1. Meta page validation — Verifies the BRIN metapage contains sane values (pages per range within BRIN_MAX_PAGES_PER_RANGE, etc.)

  2. Revmap verification — Walks the revmap (reverse map) pages and validates that:

    • Every valid revmap entry points to an index tuple with the expected range block number
    • Index tuples are consistent with the tuple descriptor
    • Empty ranges correctly have allnulls = true and hasnulls = false (an undocumented invariant known from the BRIN source code)
  3. Regular pages check (optional) — Walks regular index pages verifying that every index tuple has a corresponding revmap entry pointing back to it. This detects "orphan" index tuples — tuples that exist but aren't referenced by the revmap.

Phase 2: Heap All Indexed Check

This is the expensive cross-validation between heap and index data. It verifies that every heap tuple is "covered" by its corresponding BRIN range summary. This is analogous to B-tree's heapallindexed check but architecturally more complex because BRIN doesn't store individual tuples.

Locking Strategy

The check acquires ShareUpdateExclusiveLock on the index. This is a carefully chosen lock level that:

Key Design Debate: How to Validate Heap Consistency

The most architecturally interesting discussion centers on how to verify that heap tuples are covered by their BRIN range summaries. Three approaches were considered:

Approach 1: addValue() Function

Call the opclass's addValue() function — if it returns FALSE, the range already covers the value. Rejected because addValue() can return TRUE even when the value is already covered (e.g., minmax_multi may reorganize internal data structures, triggering a TRUE return that doesn't indicate corruption).

Approach 2: consistent() Function (Currently Adopted)

Build a ScanKey from each heap tuple's value and call the opclass's consistent() function. If the range says it doesn't contain the value, there's corruption.

The strategy number problem: To build a ScanKey, you need a strategy number, but different opclasses assign different meaning to the same strategy numbers (equality is strategy 1 in Bloom but strategy 3 in minmax). Furthermore, not all opclasses support the equality operator (box_inclusion_ops doesn't, nor do PostGIS BRIN opclasses).

The current solution makes the operator list a variadic optional argument:

Approach 3: New withinRange() Support Function

Add a new optional function to the BRIN opclass API:

bool withinRange(BrinDesc *bdesc, BrinValues *column, Datum val, bool isnull)

This was prototyped (patches 0004-0005 in v5) but ultimately deferred after Álvaro Herrera and Tomas Vondra expressed preference for using existing consistent() primitives to avoid expanding the opclass API solely for a contrib module.

Unresolved tension: Arseniy raised the valid point that the consistent() approach has UX issues — users shouldn't need to know internal operator semantics to run a health check. He proposed adding an "amcheck strategy number" to BrinOpcInfo as a mapping, but this hasn't been resolved.

Infrastructure Contributions

Common amcheck Infrastructure (0003 patch)

The patch factors out CheckIndexCheckXMin() — the logic for verifying whether the current snapshot is safe for heap-vs-index comparison (checking indcheckxmin). This was previously embedded in B-tree amcheck code but is needed by BRIN, GiST, and GIN amcheck implementations. This is proposed as a shared utility in verify_common.

Read Stream Integration

Following Andrey Borodin's suggestion, the patch adopts the read_stream infrastructure (a modern PostgreSQL I/O optimization) for sequential page reads during verification, replacing simple ReadBufferExtended loops.

BRIN Internal Exposure

Error Classification

The patch uses ERRCODE_INDEX_CORRUPTED for structural issues (recoverable via REINDEX) and discussed whether heap-inconsistency errors should use ERRCODE_DATA_CORRUPTED (unrecoverable). The final decision was to use ERRCODE_INDEX_CORRUPTED for all cases with a more nuanced error message: "Index %s is not consistent with the heap" rather than "Index %s is corrupted."

Testing Strategy

Open Issues

  1. Operator specification UX — The variadic operator argument works but is ergonomically poor for automated cluster-wide checks. A catalog-side mapping (in BrinOpcInfo or a new catalog table) would be cleaner but requires more invasive changes.
  2. Unsummarized ranges — The current heap scan touches all pages including unsummarized ranges (wasted I/O). An optimization to scan only summarized ranges was discussed but deferred since autovacuum typically keeps unsummarized ranges minimal.
  3. pg_amcheck integration — Not yet implemented; needed for command-line cluster-wide checking.