Bug in ALTER SUBSCRIPTION ... SERVER with a Broken Old Server
Background and Architectural Context
PostgreSQL 19 introduced a significant new capability for logical replication: the ability to tie a SUBSCRIPTION to a foreign server object (CREATE SUBSCRIPTION ... SERVER) rather than requiring a raw CONNECTION string. This integrates logical replication with the existing FDW / user mapping infrastructure, allowing credentials and connection properties to be centrally managed via CREATE SERVER and CREATE USER MAPPING. The subscription's effective connection string is derived at runtime by calling ForeignServerConnectionString(), which internally validates the user mapping.
This thread uncovers a classic bootstrapping / recovery bug in this new feature: the DDL path that is supposed to let you escape from a broken server binding itself depends on the broken server being functional.
The Core Problem
The bug reproduces as follows:
- Create two foreign servers (
old_srv,new_srv) with user mappings. - Create a subscription bound to
old_srv. - Drop the user mapping on
old_srv(simulating breakage — credentials rotated, mapping removed, server misconfigured, etc.). - Attempt recovery:
ALTER SUBSCRIPTION sub_bug SERVER new_srv;→ fails withuser mapping not found for user "chaol", server "old_srv"ALTER SUBSCRIPTION sub_bug CONNECTION '...';→ same failureDROP SUBSCRIPTION sub_bug;→ also fails (discovered by Kuroda)
The subscription is effectively stranded — you cannot move it away from the broken server, nor can you delete it, without first reconstructing the exact broken user mapping.
Root Cause
The comment in AlterSubscription() is explicit about intent:
/*
* Skip ACL checks on the subscription's foreign server, if any. If
* changing the server (or replacing it with a raw connection), then the
* old one will be removed anyway. ...
*/
sub = GetSubscription(subid, false, false);
The developer clearly anticipated this scenario and passed aclcheck = false. However, the implementation of GetSubscription() does not honor this intent fully. Even when aclcheck is false, the function still unconditionally calls ForeignServerConnectionString(). That function expands the connection string by resolving the user mapping — and if the mapping has been dropped, it throws an error before AlterSubscription() ever gets a chance to replace the server binding.
In other words: the code comment describes the desired behavior, but the code path violates it by eagerly materializing a connection string that is never actually needed when the caller is just going to overwrite the server reference.
Proposed Fix
Evan Chao's patch restructures GetSubscription() and the Subscription struct:
- Store
serveridin theSubscriptionstruct. Previously only the subserver OID was available via the catalog form; now it's cached on the Subscription struct itself for lazy resolution. - Decouple ACL checking from connection-string materialization. When
aclcheck = true,GetSubscription()performs the ACL check and eagerly populatessub->conninfoviaForeignServerConnectionString(). Whenaclcheck = false, neither step happens —sub->conninfois left NULL. - Introduce
GetSubscriptionConnInfo(Subscription *sub)as a lazy accessor that resolves the connection string on demand, into the subscription's memory context. - Test coverage: a new regression test reproduces the broken-server-switch scenario. A modification to
test_fdw_connectionadds aGetUserMapping()call so that the FDW's stub connection function actually validates the mapping exists (making the test fail cleanly without the fix).
This makes the DDL paths that intend to discard the old server binding truly independent of the old server's validity — matching the stated intent of the comment.
Design Tensions and Review Feedback
Curly-brace style (Ajin)
Ajin Cherian suggested bracing the multi-line if that now contains two statements (the ereport and the ForeignServerConnectionString assignment). Evan pushed back, citing existing precedent in the codebase for brace-less multi-line if bodies with ereport. This is a minor style issue; the substantive concern is whether the grouping is visually clear.
Comment correctness on GetSubscription() (Ajin, Zsolt)
Both reviewers flagged that Evan's updated header comment inverts the semantics. The rewritten comment implies callers must call GetSubscriptionConnInfo() after GetSubscription() when aclcheck=true — but in fact it's the opposite: when aclcheck=true, conninfo is already populated; the extra accessor call is only needed when aclcheck=false. Multiple apply-worker call sites use aclcheck=true and never call the accessor, confirming Ajin's point. This needs a v2 wording fix.
Moving GetForeignServer() inside the if (Zsolt)
Zsolt Parragi asks whether GetForeignServer(sub->serverid) can also be moved inside the aclcheck branch. This is a reasonable optimization — if we don't need ACL checks and we don't need the conninfo, we arguably don't need to resolve the server object at all at this point. It would further reduce the surface area of things that can fail during a recovery-oriented DDL.
Is dropping an in-use USER MAPPING even allowed? (Kuroda)
Kuroda raises an orthogonal but architecturally interesting question: should DROP USER MAPPING be blocked when a subscription depends on it? Currently there is no dependency registered from pg_subscription to the user mapping, so no protection exists. Evan counters by pointing to an existing test in subscription.sql that appears to intentionally exercise the broken-mapping case, suggesting the behavior is considered acceptable — users are expected to repair the server or switch away. This is a philosophical design choice: PostgreSQL generally prefers dependency tracking (pg_depend) to prevent drops that would orphan references, but FDW user mappings have historically had a looser model. Adding such a dependency would be a behavioral change with its own compatibility implications, so the current thread sidesteps it by making the "switch away" path actually work.
Broader Implications
This bug is a good example of a general design principle in DDL code: recovery operations must not depend on the validity of the state they are recovering from. The same principle underlies why DROP commands typically tolerate missing dependencies, why ALTER ... OWNER TO doesn't require the old owner to exist, etc. The v19 SUBSCRIPTION-SERVER feature inadvertently violated this principle because the connection-string materialization was coupled to the generic subscription-lookup path rather than being lazy.
The lazy-accessor refactoring is the right architectural fix, but the patch needs another iteration to correct the documentation/comment errors flagged by reviewers.