CREATE INDEX CONCURRENTLY on Partitioned Tables: Deep Technical Analysis
The Core Problem
PostgreSQL's CREATE INDEX CONCURRENTLY (CIC) has never supported partitioned tables. When a user issues CIC on a partitioned table, they receive an error, forcing them to manually create indexes on each partition individually and then attach them to a parent index using ALTER INDEX ... ATTACH PARTITION. For tables with hundreds or thousands of partitions, this is operationally painful and error-prone.
The fundamental challenge is architectural: CIC's multi-phase protocol (create catalog entry → build index → wait for old transactions → validate → mark valid) was designed for single relations. Extending it to partitioned tables requires orchestrating this protocol across an entire partition hierarchy while maintaining the non-blocking property that makes CIC valuable.
Architectural Approach and Evolution
Phase 1: Reindex-Based Strategy (2020-2022)
The original approach by Justin Pryzby leveraged existing REINDEX CONCURRENTLY infrastructure:
- Create catalog entries for all partitions with
indisvalid=false - Reindex them concurrently using
ReindexRelationConcurrently(), which already handles the multi-phase CIC protocol
This was elegant in code reuse but had significant drawbacks:
- Indexes got
_ccnewsuffixes (REINDEX naming convention), confusing users - On failure, leftover indexes had unexpected names that didn't participate in
DROP INDEXcascading - Progress reporting conflicted between CREATE INDEX and REINDEX command tracking
- The REINDEX path processes all indexes in lockstep across phases, which differs from per-index CIC semantics
Phase 2: Native CIC Implementation (December 2022 onwards)
Pryzby refactored to extract DefineIndexConcurrentInternal() — the concurrent portion of DefineIndex() — into a separate function. The new approach:
- Create catalog entries for all partition indexes (non-concurrently, within a single transaction)
- Loop over each leaf partition, calling
DefineIndexConcurrentInternal()to build each index concurrently - Mark intermediate partitioned indexes as valid once all their children succeed
This eliminated the _ccnew naming problem and produced a smaller, more maintainable patch.
Key Technical Challenges
Locking Protocol
CIC's defining characteristic is using ShareUpdateExclusiveLock instead of ShareLock, avoiding write blocking. For partitioned tables, several locking questions arose:
- Child relation locking: Early versions used
ShareLockto obtain the list of children, which would block writes — defeating CIC's purpose. Later versions corrected this toShareUpdateExclusiveLock. - Session-level locks: CIC holds a session-level lock across transaction boundaries. The question of whether to hold this on the parent table during child index building was debated. Releasing it between children allows concurrent DDL (like partition drops) but risks errors.
- Deadlock with concurrent DDL: If all partitions are locked in the first transaction, partitions built last experience long lock times. The final approach mirrors REINDEX CONCURRENTLY: skip relations discovered to be dropped when attempting to lock them.
Handling Concurrent Partition Changes
A critical correctness issue: what happens if a partition is dropped or detached while CIC is running? The evolution of solutions:
- Early versions: No protection — resulted in "cache lookup failed for index" errors
- Lock-all approach: Lock all partitions upfront — caused long lock times for later partitions
- Skip-if-dropped approach (final): Try to lock each partition when processing it; if it's been dropped, skip it gracefully. This mirrors REINDEX CONCURRENTLY's strategy.
Intermediate Partitioned Indexes
A subtle bug: in a multi-level partition hierarchy (e.g., partitioned table → sub-partitioned table → leaf), after successfully building all leaf indexes, the intermediate partitioned table indexes must also be marked as valid. This required tracking which OIDs are partitioned indexes vs. leaf indexes.
Progress Reporting
pg_stat_progress_create_index was designed for single-relation operations. Challenges:
ReindexRelationConcurrently()callspgstat_progress_start_command(), overwriting CREATE INDEX progress- Tracking
PROGRESS_CREATEIDX_PARTITIONS_DONErequires coordination between the CIC loop and the reindex internals - A
REINDEXOPT_REPORT_CREATE_PARTflag was introduced to suppress per-relation progress in favor of partition-level progress
Snapshot Management Bug (2025)
An assertion failure occurred on partitioned tables without leaf partitions — CIC would attempt to pop an active snapshot that was never pushed. This paralleled a fix in REINDEX (c426f7c2b36a) for event triggers and required guarding the snapshot pop with a check.
childStmt Concurrent Property Loss (2026)
The most recent fix addresses a regression where childStmt (the IndexStmt passed to child partition index creation) lost its concurrent property during processing, causing the index to be built non-concurrently despite the user's request.
Failure Semantics
The patch preserves CIC's existing failure semantics:
- If interrupted, INVALID indexes remain on partitions that were being processed
- Users can clean up with
DROP INDEXorREINDEX CONCURRENTLY - The parent partitioned index remains INVALID until all children succeed
- This is documented behavior consistent with single-table CIC
Code Organization
The final patch structure:
- Extracts
DefineIndexConcurrentInternal()fromDefineIndex()— the three-phase CIC protocol (build, wait, validate) - Adds a loop in
DefineIndex()that, for partitioned tables with CONCURRENTLY, iterates over leaf partitions calling the extracted function - Introduces a
REINDEXOPT_SKIPVALIDflag to skip already-valid indexes during the reindex phase - Adds isolation tests exercising concurrent partition drop/detach during CIC
Current Status
As of the latest message (May 2026), the patch is being maintained by Alexander Pyhalov, with the most recent fix addressing the childStmt concurrent property loss. The patch has never attracted sustained committer attention despite being functionally complete for several years, which has been a recurring frustration expressed by the authors.