Can we use Statistics Import and Export feature to perforamance testing?

First seen: 2025-04-08 00:20:58+00:00 · Messages: 11 · Participants: 5

Latest Update

2026-06-01 · claude-opus-4-6

Technical Analysis: Statistics Import/Export for Performance Testing

Core Problem

The thread addresses a fundamental question about the PostgreSQL statistics import/export feature (introduced in recent versions): can exported statistics from a production environment be used in a staging environment to reproduce the same query plans for performance testing?

This is a common enterprise scenario: organizations maintain separate production and staging environments with different data volumes but want to validate that application changes won't cause plan regressions in production. The Statistics Import and Export feature (pg_dump --statistics-only, pg_restore_relation_stats(), pg_restore_attribute_stats()) seems like it could solve this, but the reality is more nuanced.

Why This Matters Architecturally

The PostgreSQL planner does not rely solely on pg_statistic catalog entries to estimate costs. Several other factors influence plan selection:

1. Relation Size Estimation (table_block_relation_estimate_size())

This is the critical architectural insight in the thread. The planner's row estimation works as follows:

density = reltuples / relpages   (from pg_class)
estimated_rows = density * actual_current_pages  (from filesystem)

The function RelationGetNumberOfBlocks(rel) queries the actual physical size of the relation on disk. This means even if you perfectly replicate pg_class.reltuples and pg_class.relpages via statistics import, the planner will still use the real number of blocks in the staging table to scale the tuple estimate. This design exists intentionally — it allows statistics to remain useful as tables grow between ANALYZE runs without becoming stale.

This is well-documented in the row estimation examples: the planner multiplies tuple density by current pages, making statistics import insufficient on its own when table sizes differ.

2. Index OID Ordering

When multiple paths have "fuzzily equal" costs (see add_path() and compare_path_costs_fuzzily()), the tiebreaker can depend on the order indexes are presented. Since get_relation_info() sorts indexes by OID, and OIDs are not guaranteed to match after dump/restore, plan choices could differ even with identical cost estimates. This is an edge case but represents a real source of non-determinism.

3. GUC Configuration

Parameters like random_page_cost, seq_page_cost, effective_cache_size, work_mem, cpu_tuple_cost, etc., directly affect cost calculations and must be identical between environments.

Key Technical Disagreement

An important exchange occurred between Corey Huinker (author of the statistics import/export feature) and David Rowley regarding how table_block_relation_estimate_size() works:

This correction is architecturally significant: it means statistics import alone cannot make the planner behave identically unless the physical table sizes also match. Huinker acknowledged the correction.

Proposed Solutions

Workaround (David Rowley)

Use get_relation_info_hook to override:

This hook-based approach would require an extension but could theoretically produce identical plans.

Documentation Patch (Yugo Nagata)

A documentation patch (CF entry: https://commitfest.postgresql.org/patch/6155/) adds a warning to the statistics manipulation functions documentation:

"Manually restored statistics do not guarantee that the same query plans will be generated as in the source environment, since factors such as relation sizes, index OIDs, and configuration parameters may affect planner behavior."

Cross-Reference Follow-up (Vellaipandiyan)

A follow-up documentation patch adds a cross-reference from the statistics manipulation warning to the planner statistics documentation section.

Limitations Even With Perfect Statistics Matching

Even if plans could be made identical, Huinker correctly notes that statistics import only helps with EXPLAIN plan generation in isolation. It cannot reproduce:

Status

The documentation patch is registered in commitfest. The thread represents an important clarification of the feature's intended scope versus user expectations, and the documentation improvement helps set appropriate expectations for the statistics import/export feature.