[PATCH] REPLICA IDENTITY USING INDEX accepts column with invalid NOT NULL

First seen: 2026-05-14 13:06:56+00:00 · Messages: 1 · Participants: 1

Latest Update

2026-05-18 · claude-opus-4-6

Technical Analysis: REPLICA IDENTITY USING INDEX Accepts Columns with NOT NULL NOT VALID

Core Problem

This bug report identifies a logical replication correctness issue introduced as a side effect of the PostgreSQL 18 feature that added NOT NULL NOT VALID constraints (commit a379061a22a8). The problem lies at the intersection of two subsystems: constraint validation semantics and logical replication identity.

Background: Replica Identity and NOT NULL

Logical replication requires a way to uniquely identify rows on the subscriber side for UPDATE and DELETE operations. PostgreSQL supports several replica identity modes:

When REPLICA IDENTITY USING INDEX is used, PostgreSQL validates that the chosen index is suitable — specifically, it must be a unique, non-partial index where all key columns are NOT NULL. The NOT NULL requirement is critical: if a column can contain NULLs, the unique index cannot guarantee row uniqueness (since NULL ≠ NULL in SQL), and logical replication would be unable to unambiguously identify target rows on the subscriber.

The Bug

Prior to PG 18, attnotnull in pg_attribute was a reliable indicator that a column truly could not contain NULLs. The NOT NULL NOT VALID feature (a379061a22a8) changed this semantic: now attnotnull is set to true even when the NOT NULL constraint has not been validated against existing data. This means:

  1. A table can have existing NULL values in a column
  2. That column can have attnotnull = true (due to a NOT VALID constraint)
  3. A unique index on that column can be created (it doesn't enforce uniqueness on NULLs but allows multiple NULL entries)
  4. ALTER TABLE ... REPLICA IDENTITY USING INDEX checks attnotnull, sees true, and accepts the index

The result: logical replication will attempt to use this index to identify rows, but rows with NULL keys cannot be uniquely identified, causing data divergence between publisher and subscriber. This is a silent data corruption scenario — one of the most severe classes of bugs.

The Specific Code Path

The check happens in check_relation_can_be_replicated (or related validation in ATExecReplicaIdentity), which iterates over the index's key columns and verifies attnotnull. The fix needs to go beyond the boolean attnotnull flag and consult pg_constraint to verify the constraint is actually validated (convalidated = true).

Proposed Solution

The patch follows the precedent set by commit d9ffc27291f, which fixed the analogous problem for identity columns (GENERATED ALWAYS AS IDENTITY). The fix pattern is:

  1. After confirming attnotnull = true, perform an additional catalog lookup
  2. Find the corresponding NOT NULL constraint in pg_constraint
  3. Check if convalidated is true
  4. If the constraint is NOT VALID (i.e., convalidated = false), reject the index as a replica identity

This is a minimal, targeted fix that doesn't change the semantics of attnotnull itself but adds a validation guard specifically in the replica identity code path.

Architectural Implications

The attnotnull Semantic Change Problem

The introduction of NOT NULL NOT VALID fundamentally changed the meaning of attnotnull from "this column cannot contain NULLs" to "this column has a NOT NULL constraint declared (which may or may not be enforced)." This creates a systemic risk: any code path that previously relied on attnotnull as a guarantee of NULL absence may now be incorrect. The d9ffc27291f fix for identity columns and this patch represent two instances of the same class of bug. There may be others.

Performance Consideration

The additional catalog lookup (pg_constraint scan) adds overhead to the ALTER TABLE ... REPLICA IDENTITY USING INDEX command, but since this is a DDL operation executed infrequently, the performance impact is negligible.

Correctness vs. Usability Tradeoff

An alternative approach might be to allow the replica identity but warn, or to automatically validate the constraint first. However, since silent data divergence in replication is catastrophic, the strict rejection approach is correct. Users can always run ALTER TABLE ... VALIDATE CONSTRAINT first.

Severity Assessment

This is a high-severity correctness bug affecting PostgreSQL 18+. While it requires a specific sequence of operations to trigger (NOT NULL NOT VALID + unique index + replica identity), the failure mode (silent replication data divergence) is extremely dangerous in production logical replication setups.