Fix regression in vacuumdb --analyze-in-stages for partitioned tables

First seen: 2026-05-29 08:40:56+00:00 · Messages: 2 · Participants: 2

Latest Update

2026-06-01 · claude-opus-4-6

Fix Regression in vacuumdb --analyze-in-stages for Partitioned Tables

Core Problem

A refactoring commit (c4067383cb2) introduced a subtle regression in vacuumdb's handling of partitioned tables when --analyze-in-stages is used. The regression causes partitioned tables to be silently excluded from analysis during staged analyze operations, breaking documented behavior.

Technical Background

PostgreSQL's vacuumdb utility supports several modes of operation:

The key architectural insight is that ANALYZE on partitioned tables is meaningful—it collects statistics that represent the aggregate data distribution across all partitions, which the optimizer uses for partition pruning decisions and join estimation. Skipping partitioned tables during analysis degrades query plan quality.

The Regression Mechanism

The original feature commit (6429e5b77) introduced support for partitioned tables in vacuumdb --analyze-only and --analyze-in-stages. The code used a boolean field analyze_only in the vacuumingOptions struct, which was set to true for BOTH --analyze-only and --analyze-in-stages:

case 3:  /* --analyze-in-stages */
    analyze_in_stages = vacopts.analyze_only = true;
    break;

The catalog query construction logic then checked vacopts->analyze_only to decide whether to include RELKIND_PARTITIONED_TABLE in the relation kinds filter.

The refactoring commit c4067383cb2 replaced the boolean analyze_only with an enum mode field having distinct values:

However, the catalog query logic was only updated to check for MODE_ANALYZE:

if (vacopts->mode == MODE_ANALYZE)

This missed MODE_ANALYZE_IN_STAGES, meaning that when --analyze-in-stages is used, the query falls through to the else branch which only includes regular tables and materialized views—silently dropping partitioned tables from all three analysis stages.

Proposed Solution

The fix is straightforward: extend the condition to also match MODE_ANALYZE_IN_STAGES:

if (vacopts->mode == MODE_ANALYZE || vacopts->mode == MODE_ANALYZE_IN_STAGES)

The patch also adds a regression test to prevent future regressions of this behavior. This is notable because the absence of such a test is precisely what allowed the regression to slip through code review of the refactoring commit.

Architectural Implications

This is a classic example of how refactoring that replaces implicit semantic grouping (a boolean that was true for multiple related modes) with explicit enumeration (distinct enum values for each mode) can introduce regressions when not all code paths that relied on the implicit grouping are updated. The refactoring improved code clarity but broke a semantic equivalence that existed between the two analyze modes with respect to relation kind filtering.

The regression is particularly insidious because:

  1. It produces no error—partitioned tables are simply silently skipped
  2. The user may not notice degraded statistics quality immediately
  3. The documentation still promises the behavior works

Key Design Consideration

A potential alternative design would be to introduce a helper function like mode_is_analyze_only(mode) that returns true for both MODE_ANALYZE and MODE_ANALYZE_IN_STAGES, encapsulating the semantic relationship between these modes. This would prevent similar regressions if additional analyze-related modes are added in the future. However, the simple conditional fix is appropriate for a bug fix commit.