[PATCH] Preserve replication origin OIDs in pg_upgrade

First seen: 2026-04-28 11:19:38+00:00 · Messages: 7 · Participants: 4

Latest Update

2026-05-06 · opus 4.7

Preserving Replication Origin OIDs in pg_upgrade

The Core Architectural Problem

This thread addresses a silent data-correctness hazard at the intersection of three PostgreSQL subsystems: pg_upgrade, logical replication origins, and commit timestamps (pg_commit_ts). The bug manifests as spurious update_origin_differs conflicts after a major-version upgrade of a logical replication subscriber, and in the worst case causes the subscriber to attribute row modifications to the wrong upstream publisher.

Why roidents are embedded in pg_commit_ts

When track_commit_timestamp is enabled, each committed transaction's SLRU record in pg_commit_ts stores not just the commit time but also a RepOriginId (a 2-byte roident). This is the mechanism by which conflict detection on the subscriber side — specifically the update_origin_differs and delete_origin_differs conflict types introduced for logical replication conflict detection — determines whether a local row was last modified by the local node or by some remote origin. Crucially, the roident stored in the SLRU is a numeric identifier, not the textual origin name (pg_<suboid>). The mapping from roident → roname lives in the pg_replication_origin catalog.

The breakage under pg_upgrade

The chain of fragility is:

  1. pg_upgrade preserves relfilenodes, TOAST OIDs, relation OIDs, type OIDs, etc., but historically does not preserve subscription OIDs.
  2. A subscription's replication origin is conventionally named pg_<suboid> (see ApplyWorkerMain / replorigin_by_name usage). If the suboid changes, the origin name changes.
  3. During CREATE SUBSCRIPTION on the new cluster, replorigin_create() allocates a fresh roident by scanning pg_replication_origin for the lowest unused 2-byte ID. The order of assignment depends on the order CREATE SUBSCRIPTION runs and on prior allocations — it is not stable across upgrades.
  4. Meanwhile, pg_commit_ts was proposed (in the sibling thread referenced as [1]) to be copied byte-for-byte from the old cluster to preserve conflict-detection metadata.

The result is a semantic mismatch: SLRU records say "roident 1 wrote this row" meaning subA in the old cluster, but the new cluster's catalog says roident 1 is subB. Conflict detection will incorrectly fire (or fail to fire). Ajin's opening message frames this crisply with the "swap" scenario — the most dangerous case because it converts silent metadata into actively wrong conflict verdicts.

Design Evolution: Two Competing Approaches

Approach 1 (v1): Special-case subscription origins

Ajin's initial patch took a surgical approach:

This treats subscription-associated origins as a distinct class from user-created origins (e.g., those created manually via pg_replication_origin_create() for custom replication solutions like pglogical or bidirectional setups).

Approach 2 (v2/v3): Preserve subscription OIDs, then everything falls out

Kuroda-san's response reframes the problem: if subscription OIDs were preserved across pg_upgrade, the origin name pg_<suboid> would be stable, but — as Shveta correctly pushes back — name stability alone does not imply roident stability, because roident allocation is independent and order-dependent.

Shveta's counterexample is important and technically precise: with two subscriptions at roidents 2 and 3 (because roident 1 had been used and dropped), re-creation from scratch would allocate from 1 upward, yielding 1 and 2. Same names, different numeric IDs, same bug.

Vignesh then produces the missing piece: a rebased patch that preserves subscription OIDs themselves. Ajin's v3 composes these:

This is architecturally cleaner: it eliminates the special case, reduces the number of pg_upgrade support functions, and ensures the roident preservation is a consequence of the same generic mechanism that handles manually-created origins.

Key Technical Tradeoffs and Subtleties

  1. OID preservation as a precondition. Kuroda's reference to the earlier thread [1] notes that subscription-OID preservation had been previously proposed but rejected for lack of motivation. The conflict-detection correctness argument here is the "strong motivation" that was missing — this is an important procedural point. Preserving subscription OIDs has minor implications (the OID namespace is shared; GetNewOidWithIndex on pg_subscription must accept the preserved value), but no downsides surface in the thread.

  2. Non-subscription origins matter too. The v3 approach correctly handles user-created origins (e.g., those used by logical replication extensions, custom apply workers, or bidirectional configurations). The v1 approach already handled them; v3 unifies them with subscription origins. This is important because pg_commit_ts records reference any roident, not just subscription-derived ones.

  3. LSN position (remote_lsn) preservation. Origin state isn't just an OID — it includes the replication progress LSN (replorigin_session_origin_lsn / the value advanced by pg_replication_origin_advance). Both approaches preserve this via binary_upgrade_replorigin_advance(). Without it, the subscriber would re-request changes from the publisher starting at an earlier LSN, causing duplicate apply and conflicts.

  4. Ordering of restore steps. In v3: dumpall creates origins (with preserved roident+name+LSN) → per-database CREATE SUBSCRIPTION runs but is instructed to skip origin creation → subscriptions are re-enabled. The skip is necessary because CREATE SUBSCRIPTION would otherwise try replorigin_create() on an already-existing name.

  5. Coupling with the pg_commit_ts migration patch. This patch is only useful if pg_commit_ts is actually being migrated (the sibling thread [1]). Without that migration, there are no stale roident references to worry about. The two patches are logically a unit.

Participant Dynamics

All four participants are active in the logical replication area, and the discussion converges quickly (within ~8 days from proposal to v3) — suggesting broad agreement on the problem and solution shape.

Assessment

The v3 design is the right one. It:

Open questions not fully resolved in the visible thread: behavior when the new cluster already has a conflicting roident (shouldn't happen on a fresh target, but worth asserting), and whether the preserved-origin creation should be gated on track_commit_timestamp being enabled on the old cluster (arguably always preserve, since the overhead is negligible and extensions may rely on origin identity).