[PATCH] Improve REPACK (CONCURRENTLY) error messages for unsupported configurations

First seen: 2026-05-27 03:06:13+00:00 · Messages: 9 · Participants: 3

Latest Update

2026-06-01 · claude-opus-4-6

Technical Analysis: Improving REPACK (CONCURRENTLY) Error Messages

Core Problem

The REPACK (CONCURRENTLY) feature — a relatively new addition to PostgreSQL that allows online table repacking without holding exclusive locks for extended periods — had several user-facing error paths that produced confusing, misleading, or poorly-categorized error messages. The feature relies on logical replication infrastructure (replication slots, logical decoding) to track changes during the repack operation, which introduces complex precondition requirements that users may violate without understanding why their operation fails.

Three distinct categories of error message deficiencies were identified:

1. wal_level Precondition (0001)

When wal_level < replica, the REPACK CONCURRENTLY path would fail deep inside CheckSlotRequirements(), surfacing a generic replication-slot error with a misleading CONTEXT line referencing an internal worker process. Users would see infrastructure-level errors rather than a clear "your configuration doesn't support this operation" message. The fix adds an upfront check in the REPACK code path itself, producing a REPACK-specific error before the operation reaches the replication slot machinery.

2. Identity Index Disambiguation (0002)

check_concurrent_repack_requirements() calls GetRelationIdentityOrPK() which returns InvalidOid for several distinct situations, but the error handling collapsed them all into a single "no identity index" message. This was misleading in at least two cases:

The fix infers the specific reason from relreplident, rd_ispkdeferrable, rd_pkindex, etc., and produces distinct error messages for each case.

3. Missing errcodes (0003)

Four ereport(ERROR) calls in the REPACK CONCURRENTLY path lacked explicit errcode() specifications, defaulting to ERRCODE_INTERNAL_ERROR. This is problematic for programmatic error handling (e.g., client applications that branch on SQLSTATE). The fix maps:

Architectural Significance

This work matters because REPACK CONCURRENTLY sits at the intersection of several complex subsystems:

  1. Logical replication infrastructure (replication slots, logical decoding)
  2. Replica identity mechanism (determining which columns/indexes identify rows)
  3. Constraint semantics (deferrable vs. non-deferrable PKs)
  4. WAL configuration (wal_level requirements)

When these subsystems produce errors through their own generic paths, users cannot diagnose REPACK-specific precondition failures. The principle at work is: features that depend on lower-level infrastructure should validate preconditions at their own level before delegating to that infrastructure, providing domain-appropriate error messages.

Key Design Decisions and Tradeoffs

Errcode Selection for wal_level Check

Álvaro changed the proposed errcode from ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE to ERRCODE_INVALID_PARAMETER_VALUE, reasoning that "object" implies a database object, whereas wal_level is a server configuration parameter. This is a subtle but important distinction for SQLSTATE semantics.

Error Message Scope for wal_level

Álvaro also changed the errmsg() from "cannot repack table X" to a more general formulation, because the wal_level restriction applies to all tables, not just the named one. Saying "cannot repack table X" implies table Y might work.

Maintainability vs. Specificity (0002 Controversy)

The central design tension was between diagnostic specificity (telling users exactly what's wrong) and maintainability (avoiding fragile inference logic that must track future feature changes). Chao raised three concrete concerns:

Álvaro ultimately pushed 0002 anyway (in modified form), reasoning that the usability benefit outweighs the maintenance cost — the replica identity issue is an "unnecessary usability tripwire."

Error Message Style

Álvaro applied PostgreSQL's message style convention of "could not do X" rather than "failed to do X", though he noted he couldn't find this specific rule documented in the style guide.

Additional Coverage Work

After pushing the three patches, Álvaro identified remaining uncovered error paths and submitted a follow-up patch adding test coverage. The final state leaves only a few edge cases uncovered:

Buildfarm Impact

The 0001 patch (wal_level check) caused a buildfarm failure on thorntail, which runs with wal_level=minimal. The new upfront error message triggered during tests that previously passed because they never reached the replication slot code. Álvaro fixed this by moving some tests to test_decoding (which requires wal_level >= logical).

Broader Context

Chao's reference to a pending patch for REPLICA_IDENTITY_DEFAULT fallback to FULL ([1]) reveals ongoing work to relax REPACK CONCURRENTLY's requirements. This confirms the maintainability concern: the identity-checking logic in the error path will need updates as the feature evolves. However, this is a general tension in PostgreSQL development — features evolve, and error messages that reference specific limitations will always need maintenance when those limitations are lifted.