Technical Analysis: Improving REPACK (CONCURRENTLY) Error Messages
Core Problem
The REPACK (CONCURRENTLY) feature — a relatively new addition to PostgreSQL that allows online table repacking without holding exclusive locks for extended periods — had several user-facing error paths that produced confusing, misleading, or poorly-categorized error messages. The feature relies on logical replication infrastructure (replication slots, logical decoding) to track changes during the repack operation, which introduces complex precondition requirements that users may violate without understanding why their operation fails.
Three distinct categories of error message deficiencies were identified:
1. wal_level Precondition (0001)
When wal_level < replica, the REPACK CONCURRENTLY path would fail deep inside CheckSlotRequirements(), surfacing a generic replication-slot error with a misleading CONTEXT line referencing an internal worker process. Users would see infrastructure-level errors rather than a clear "your configuration doesn't support this operation" message. The fix adds an upfront check in the REPACK code path itself, producing a REPACK-specific error before the operation reaches the replication slot machinery.
2. Identity Index Disambiguation (0002)
check_concurrent_repack_requirements() calls GetRelationIdentityOrPK() which returns InvalidOid for several distinct situations, but the error handling collapsed them all into a single "no identity index" message. This was misleading in at least two cases:
- REPLICA IDENTITY FULL: The table has a replica identity set, but the error says there's no identity — confusing.
- Deferrable PK: The table has a primary key that is skipped due to the restriction added in commit
832e220d99a(deferrable constraints cannot be used for logical replication identity purposes), but the hint suggests adding an index that already exists.
The fix infers the specific reason from relreplident, rd_ispkdeferrable, rd_pkindex, etc., and produces distinct error messages for each case.
3. Missing errcodes (0003)
Four ereport(ERROR) calls in the REPACK CONCURRENTLY path lacked explicit errcode() specifications, defaulting to ERRCODE_INTERNAL_ERROR. This is problematic for programmatic error handling (e.g., client applications that branch on SQLSTATE). The fix maps:
- Apply-phase update/delete failures →
ERRCODE_T_R_SERIALIZATION_FAILURE - Configuration errors → appropriate codes
Architectural Significance
This work matters because REPACK CONCURRENTLY sits at the intersection of several complex subsystems:
- Logical replication infrastructure (replication slots, logical decoding)
- Replica identity mechanism (determining which columns/indexes identify rows)
- Constraint semantics (deferrable vs. non-deferrable PKs)
- WAL configuration (wal_level requirements)
When these subsystems produce errors through their own generic paths, users cannot diagnose REPACK-specific precondition failures. The principle at work is: features that depend on lower-level infrastructure should validate preconditions at their own level before delegating to that infrastructure, providing domain-appropriate error messages.
Key Design Decisions and Tradeoffs
Errcode Selection for wal_level Check
Álvaro changed the proposed errcode from ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE to ERRCODE_INVALID_PARAMETER_VALUE, reasoning that "object" implies a database object, whereas wal_level is a server configuration parameter. This is a subtle but important distinction for SQLSTATE semantics.
Error Message Scope for wal_level
Álvaro also changed the errmsg() from "cannot repack table X" to a more general formulation, because the wal_level restriction applies to all tables, not just the named one. Saying "cannot repack table X" implies table Y might work.
Maintainability vs. Specificity (0002 Controversy)
The central design tension was between diagnostic specificity (telling users exactly what's wrong) and maintainability (avoiding fragile inference logic that must track future feature changes). Chao raised three concrete concerns:
- The inference logic examines
relreplident,rd_ispkdeferrable, etc. afterGetRelationIdentityOrPK()returns InvalidOid — this coupling means future changes to the requirements must update these checks. - The hint for deferrable PK is very prescriptive ("ALTER CONSTRAINT ... NOT DEFERRABLE") but may not be the only solution.
- A pending patch to allow REPLICA_IDENTITY_DEFAULT to fall back to FULL would make the FULL-specific error misleading.
Álvaro ultimately pushed 0002 anyway (in modified form), reasoning that the usability benefit outweighs the maintenance cost — the replica identity issue is an "unnecessary usability tripwire."
Error Message Style
Álvaro applied PostgreSQL's message style convention of "could not do X" rather than "failed to do X", though he noted he couldn't find this specific rule documented in the style guide.
Additional Coverage Work
After pushing the three patches, Álvaro identified remaining uncovered error paths and submitted a follow-up patch adding test coverage. The final state leaves only a few edge cases uncovered:
- Repacking a temp table belonging to another session (requires isolation test)
- Invalid index (line 813) — unusual setup required
- Shared catalog with USING INDEX (line 580, needs
allow_system_table_mods) - Various
elog(ERROR)can't-happen defensive checks
Buildfarm Impact
The 0001 patch (wal_level check) caused a buildfarm failure on thorntail, which runs with wal_level=minimal. The new upfront error message triggered during tests that previously passed because they never reached the replication slot code. Álvaro fixed this by moving some tests to test_decoding (which requires wal_level >= logical).
Broader Context
Chao's reference to a pending patch for REPLICA_IDENTITY_DEFAULT fallback to FULL ([1]) reveals ongoing work to relax REPACK CONCURRENTLY's requirements. This confirms the maintainability concern: the identity-checking logic in the error path will need updates as the feature evolves. However, this is a general tension in PostgreSQL development — features evolve, and error messages that reference specific limitations will always need maintenance when those limitations are lifted.