Vacuumlo improvements

First seen: 2026-05-12 17:34:10+00:00 · Messages: 7 · Participants: 3

Latest Update

2026-06-04 · claude-opus-4-6

Vacuumlo Improvements — May 2026 Summary

Overview

A critical correctness bug was identified in vacuumlo: the tool silently skips columns typed as domains over oid or lo, causing it to incorrectly classify referenced large objects as orphaned and delete them without warning. The thread rapidly converged on a fix and then expanded scope to address additional blind spots.

The Core Bug: Domain-over-OID Blindness

vacuumlo scans pg_attribute joined with pg_type to find columns that could hold large object references, but only checks for exact type matches (oid or lo). It does not resolve domain types via pg_type.typbasetype. Since domains over base types are idiomatic PostgreSQL, this is a fundamental correctness failure in reference resolution — not an edge case.

The failure mode is particularly dangerous because:

Fix: Recursive Domain Resolution (Patch Provided)

Nathan Bossart provided a patch that modifies the catalog query to recursively walk pg_type.typbasetype chains (handling domains-over-domains) until reaching a base type (typbasetype = 0), then checks if that base type is oid or lo. This is a well-understood catalog query pattern with negligible performance impact given shallow type hierarchies.

Scope Expansion: Composite Types and Arrays

Sami Imseih proposed extending detection to:

Both are catalog-detectable but introduce architectural complexity: vacuumlo currently generates simple column IN (...) predicates, but composites require field-access expressions (e.g., ("data")."lo_ref") and arrays require unnest() or ANY() constructs. Imseih himself noted diminishing returns, since users sometimes store LO references in text columns which are fundamentally undetectable.

Converged Strategy

The thread settled on a three-pronged approach (explicitly agreed by Bossart):

  1. Improve detection — recursive domain resolution (immediate fix), plus composite/array support (future work)
  2. Improve documentation — make data-loss risk prominent, document what vacuumlo cannot catch
  3. Steer users toward the lo extension — recommend lo_manage() triggers as the preferred lifecycle management approach, with statement-level BEFORE TRUNCATE triggers to handle the TRUNCATE gap

Additional Notes

History (1 prior analysis)
2026-06-04 · claude-opus-4-6

Incremental Update: Sami Imseih Submits Concrete Patch Set

Patch Submission

Sami Imseih has submitted a two-patch series implementing the previously discussed improvements:

Patch 0001: Recursive type resolution with expression generation

This is the most technically significant artifact in the thread so far. Rather than just fixing the domain-resolution blind spot (Bossart's original scope), this patch implements the broader composite/array detection strategy that Imseih previously proposed. Key architectural characteristics:

  • vacuumlo now recursively searches for OID or LO types inside complex data types (domains, composites, and presumably arrays)
  • The query returns the search expression appropriate to the type, which is then used in the DELETE statement. This addresses the previously identified challenge that composite types require field-access expressions rather than simple column references — the catalog query itself now generates the appropriate extraction expression (e.g., ("col")."field" for composites)

This is a non-trivial design choice: rather than hard-coding expression patterns in application logic, the SQL catalog query dynamically produces the correct access expression based on the type structure it discovers. This keeps the C code simpler at the cost of a more complex catalog query.

Patch 0002: Documentation improvements

Adds explicit mention of the data-loss risk and steers users toward the lo extension, implementing the documentation prong of the three-pronged strategy.

What Remains Unclear

  • No details on whether array types (OID[], LO[]) are handled via unnest() or ANY() constructs
  • No mention of test coverage improvements (which Bossart previously flagged as needed)
  • The patch has not yet received review from Bossart or others
  • The apology for "late reply" suggests this was developed over the ~week since the composite/array discussion