Track skipped tables during autovacuum and autoanalyze

2026-05-18 · claude-opus-4-6

Incremental Update — 2026-05-14 through 2026-05-16

Summary

A new reviewer (Zsolt Parragi from Percona) joined the thread and surfaced three concrete issues with the patch. Two were real bugs; the third was a clarification. This triggered v8 and v9 of the patch with substantive fixes.

New Issues Raised

1. Missing `PGSTAT_FILE_FORMAT_ID` bump (real bug, quickly fixed)

Zsolt pointed out that adding 8 new fields to PgStat_StatTabEntry changes the on-disk stats file layout, which requires incrementing PGSTAT_FILE_FORMAT_ID so that an upgraded server doesn't misparse stale stats files. This was simply overlooked. Sami fixed it in v8.

2. VACUUM FULL asymmetry with skip counters (not a bug)

Zsolt reported an apparent asymmetry: VACUUM (FULL, ANALYZE, SKIP_LOCKED) t increments skipped_analyze_count but not skipped_vacuum_count. Sami clarified this is intentional and consistent with existing behavior—vacuum_count and last_vacuum already exclude VACUUM FULL (which is really a rewrite, not a standard vacuum). This was mentioned earlier in the thread.

3. Orphaned stats entry on concurrent DROP TABLE (real bug, significant fix)

This is the most technically interesting finding. Zsolt demonstrated via injection-point testing that a narrow race window exists:

After RangeVarGetRelid() returns a valid Oid and the syscache is released, but before pgstat_report_skipped_vacuum_analyze() updates the stats entry...
...if the table is dropped, pgstat_get_entry_ref_locked() creates a new stats entry for the now-gone relation.
This orphaned entry persists permanently—it survives pg_stat_reset() and is visible via internal functions like pg_stat_get_skipped_autoanalyze_count(), though not in pg_stat_all_tables (which joins against pg_class).

The fundamental tension: you cannot hold a lock while updating the entry, because the entire purpose of the stat is to record failure to acquire a lock.

v8 fix: Sami adopted the pattern from pgstat_init_function_usage()—after pgstat_get_entry_ref_locked() creates or locates the entry, perform a second syscache lookup to verify the relation still exists. If it's gone, drop the freshly-created entry instead of populating it. This prevents leaked entries.

Zsolt noted this isn't an exact mirror of pgstat_init_function_usage(), which conditionally rechecks only when a created flag indicates a fresh entry. The skip-stats path always rechecks, which is slightly more conservative but avoids an additional pgstat_lock_entry call.

4. Injection-point isolation tests (v8-0002)

Sami added injection-point-based isolation tests covering:

DROP + ROLLBACK: Lock is skipped, table is dropped but rolled back → stats are correctly updated (entry exists).
DROP + COMMIT: Lock is skipped, table is dropped and committed → no orphaned stats entry remains.

Tests use pg_stat_get_skipped_vacuum_count() to verify correctness.

5. v9 minor cleanups

Injection point renamed from expand-vacuum-rel-skip-locked to skipped-vacuum-analyze-before-entry-lock to reflect its actual location inside pgstat_report_skipped_vacuum_analyze().
Removed the comment claiming the pattern "mirrors" pgstat_init_function_usage() per Zsolt's feedback that the patterns are similar but not identical.