[PATCH] Fix NULL dereference in subscription REFRESH on concurrent DROP

First seen: 2026-05-24 07:57:11+00:00 · Messages: 1 · Participants: 1

Latest Update

2026-05-25 · claude-opus-4-6

Technical Analysis: Fix NULL Dereference in Subscription REFRESH on Concurrent DROP

Core Problem

This thread addresses a crash bug (segfault) in PostgreSQL's logical replication subscription management code. The issue manifests when ALTER SUBSCRIPTION ... REFRESH PUBLICATION is executed concurrently with a DROP TABLE (or DROP SEQUENCE) on a table that is part of the subscription.

Architectural Context

PostgreSQL's logical replication uses subscriptions to track which publications (and their underlying tables/sequences) a subscriber node should replicate. When a user issues ALTER SUBSCRIPTION ... REFRESH PUBLICATION, the system must reconcile the current state of published tables with what the subscription knows about. This involves:

  1. Collecting a list of OIDs for locally-subscribed relations (subrel_local_oids)
  2. Iterating over those OIDs to check origin information
  3. Calling get_rel_name() to resolve OIDs to relation names for diagnostic/error messages

The Race Condition

The critical vulnerability lies in the lack of relation-level locks during the iteration in check_publications_origin_tables() (and the analogous check_publications_origin_sequences()). The sequence of events is:

  1. Session A begins ALTER SUBSCRIPTION ... REFRESH PUBLICATION and collects subrel_local_oids — a list of OIDs for relations currently in the subscription.
  2. Session B concurrently executes DROP TABLE on one of those relations, removing it from the catalog.
  3. Session A continues iterating and calls get_rel_name(oid) for the now-dropped relation. Since the catalog entry no longer exists, get_rel_name() returns NULL.
  4. That NULL pointer is passed directly to quote_literal_cstr(), which unconditionally dereferences it, causing a segmentation fault.

This is a classic TOCTOU (Time-of-Check-to-Time-of-Use) race condition. The OID list represents a snapshot that becomes stale between collection and use.

Why This Matters Architecturally

Proposed Solution

The patch adds NULL checks after calls to get_rel_name() and get_namespace_name() in both:

If the relation name resolves to NULL (indicating the relation was dropped concurrently), the code simply skips that relation and continues processing the rest. This is semantically correct because:

  1. A dropped relation cannot be part of any publication anymore.
  2. The subscription refresh will naturally remove it from the subscription's relation set.
  3. There's no useful diagnostic or error to emit about a relation that no longer exists.

Alternative Approaches Not Taken

Key Technical Insights

The fix is minimal and defensive. It follows the same pattern used elsewhere in PostgreSQL where catalog lookups on OIDs may return NULL for concurrently-dropped objects (e.g., pg_stat views, autovacuum workers). The principle is: if a system catalog lookup returns NULL for an OID that was valid moments ago, treat the object as gone and proceed gracefully.

Assessment

This appears to be a straightforward, low-risk bug fix for a genuine crash scenario. The patch is small in scope and follows established PostgreSQL patterns for handling concurrent DDL. It would likely be back-patched to all supported versions where the affected code exists (likely PG15+ where check_publications_origin_tables/sequences was introduced as part of the subscription origin checking infrastructure).