2026-06-01 · claude-opus-4-6

Technical Analysis: Statistics Import/Export for Performance Testing

Core Problem

The thread addresses a fundamental question about the PostgreSQL statistics import/export feature (introduced in recent versions): can exported statistics from a production environment be used in a staging environment to reproduce the same query plans for performance testing?

This is a common enterprise scenario: organizations maintain separate production and staging environments with different data volumes but want to validate that application changes won't cause plan regressions in production. The Statistics Import and Export feature (pg_dump --statistics-only, pg_restore_relation_stats(), pg_restore_attribute_stats()) seems like it could solve this, but the reality is more nuanced.

Why This Matters Architecturally

The PostgreSQL planner does not rely solely on pg_statistic catalog entries to estimate costs. Several other factors influence plan selection:

1. Relation Size Estimation (`table_block_relation_estimate_size()`)

This is the critical architectural insight in the thread. The planner's row estimation works as follows:

density = reltuples / relpages   (from pg_class)
estimated_rows = density * actual_current_pages  (from filesystem)

The function RelationGetNumberOfBlocks(rel) queries the actual physical size of the relation on disk. This means even if you perfectly replicate pg_class.reltuples and pg_class.relpages via statistics import, the planner will still use the real number of blocks in the staging table to scale the tuple estimate. This design exists intentionally — it allows statistics to remain useful as tables grow between ANALYZE runs without becoming stale.

This is well-documented in the row estimation examples: the planner multiplies tuple density by current pages, making statistics import insufficient on its own when table sizes differ.

2. Index OID Ordering

When multiple paths have "fuzzily equal" costs (see add_path() and compare_path_costs_fuzzily()), the tiebreaker can depend on the order indexes are presented. Since get_relation_info() sorts indexes by OID, and OIDs are not guaranteed to match after dump/restore, plan choices could differ even with identical cost estimates. This is an edge case but represents a real source of non-determinism.

3. GUC Configuration

Parameters like random_page_cost, seq_page_cost, effective_cache_size, work_mem, cpu_tuple_cost, etc., directly affect cost calculations and must be identical between environments.

Key Technical Disagreement

An important exchange occurred between Corey Huinker (author of the statistics import/export feature) and David Rowley regarding how table_block_relation_estimate_size() works:

Huinker initially claimed that the function determines relation size from pg_class values (which are set by pg_restore_relation_stats()), implying statistics import would be sufficient.
Rowley corrected this, pointing out that while pg_class.relpages and pg_class.reltuples are used to compute tuple density, the actual page count comes from RelationGetNumberOfBlocks(rel) — a filesystem-level call that returns the real table size.

This correction is architecturally significant: it means statistics import alone cannot make the planner behave identically unless the physical table sizes also match. Huinker acknowledged the correction.

Proposed Solutions

Workaround (David Rowley)

Use get_relation_info_hook to override:

Relation sizes (to match production)
Index ordering (to ensure deterministic tiebreaking)

This hook-based approach would require an extension but could theoretically produce identical plans.

Documentation Patch (Yugo Nagata)

A documentation patch (CF entry: https://commitfest.postgresql.org/patch/6155/) adds a warning to the statistics manipulation functions documentation:

"Manually restored statistics do not guarantee that the same query plans will be generated as in the source environment, since factors such as relation sizes, index OIDs, and configuration parameters may affect planner behavior."

Cross-Reference Follow-up (Vellaipandiyan)

A follow-up documentation patch adds a cross-reference from the statistics manipulation warning to the planner statistics documentation section.

Limitations Even With Perfect Statistics Matching

Even if plans could be made identical, Huinker correctly notes that statistics import only helps with EXPLAIN plan generation in isolation. It cannot reproduce:

work_mem overflow behavior with real data volumes
Buffer cache behavior and I/O patterns
Lock contention from concurrent workloads
Actual tuple processing costs with different data distributions

Status

The documentation patch is registered in commitfest. The thread represents an important clarification of the feature's intended scope versus user expectations, and the documentation improvement helps set appropriate expectations for the statistics import/export feature.

Can we use Statistics Import and Export feature to perforamance testing?

Latest Update

Technical Analysis: Statistics Import/Export for Performance Testing

Core Problem

Why This Matters Architecturally

1. Relation Size Estimation (`table_block_relation_estimate_size()`)

2. Index OID Ordering

3. GUC Configuration

Key Technical Disagreement

Proposed Solutions

Workaround (David Rowley)

Documentation Patch (Yugo Nagata)

Cross-Reference Follow-up (Vellaipandiyan)

Limitations Even With Perfect Statistics Matching

Status

Can we use Statistics Import and Export feature to perforamance testing?

Latest Update

Technical Analysis: Statistics Import/Export for Performance Testing

Core Problem

Why This Matters Architecturally

1. Relation Size Estimation (table_block_relation_estimate_size())

2. Index OID Ordering

3. GUC Configuration

Key Technical Disagreement

Proposed Solutions

Workaround (David Rowley)

Documentation Patch (Yugo Nagata)

Cross-Reference Follow-up (Vellaipandiyan)

Limitations Even With Perfect Statistics Matching

Status

1. Relation Size Estimation (`table_block_relation_estimate_size()`)