Use-after-free in postgres_fdw: Premature PGconn Disposal During Subtransaction Abort
The Core Problem
This thread addresses a concrete use-after-free bug in postgres_fdw that produces a backend crash via a reasonably realistic sequence of operations: a cursor held across a savepoint on a foreign table, followed by the remote session being terminated, followed by further subtransaction activity on the same foreign server.
The architectural issue lies in the lifecycle mismatch between two pieces of state:
ConnCacheEntry->conn— the canonical, hash-table-cachedPGconn *keyed by user mapping inconnection.c.PgFdwScanState->conn— a copy of thatPGconn *pointer cached on the executor node when the scan is initialized (postgresBeginForeignScan), and later used by cursor-management routines such asclose_cursor().
The scan-state copy is never reconciled with the cache entry. As long as the cache entry and the scan state agree, this is benign; but pgfdw_reject_incomplete_xact_state_change() can break that invariant by calling disconnect_pg_server(), which PQfinish()es the connection and NULLs ConnCacheEntry->conn. The scan state still holds the freed pointer, and any subsequent cursor cleanup dereferences it.
Why the bad state is reached
pgfdw_reject_incomplete_xact_state_change() exists because if a remote transaction-control command (RELEASE SAVEPOINT, ROLLBACK TO SAVEPOINT, etc.) was interrupted mid-flight, changing_xact_state remains true and the connection's transactional state is unknowable. Historically the FDW's response has been: eagerly drop the connection, because it cannot be trusted for further work.
The reproducer exposes that "eager" was too eager:
DECLARE c1 CURSOR FOR SELECT * FROM ft1; FETCH c1;— caches thePGconn *intoPgFdwScanState.pg_terminate_backend()on the remote kills the server-side session.ROLLBACK TO s1attemptsROLLBACK TO SAVEPOINTremotely, fails mid-command, leaveschanging_xact_state = true.- The next statement triggers
pgfdw_reject_incomplete_xact_state_change(), which callsdisconnect_pg_server()—PQfinishon the cachedPGconn, cache entry nulled. The ERROR "connection to server was lost" is raised inside the subtransaction. ROLLBACK TO s1succeeds (subtransaction abort, main transaction still live).CLOSE c1— local cursor close path, which still thinks it owns a remote cursor, callsclose_cursor()usingPgFdwScanState->conn, which is now a dangling pointer → crash.
Crucially, the ERROR was raised inside a subtransaction, so per PostgreSQL semantics we must permit further commands in the outer transaction after ROLLBACK TO. This is why "just forbid further activity" is not an acceptable answer.
Design Space and the Chosen Fix
Two families of fixes were considered:
(A) Keep the eager disconnect, but propagate invalidation. Either make PgFdwScanState->conn indirect (point to the ConnCacheEntry, dereference ->conn at use, NULL-check) or add a "dead" flag on a shared PgFdwConnState consulted by all code paths. Evan Chao initially leaned this direction; Matheus Alcantara sketched it concretely as options (1) and (2).
(B) Defer the disconnect until top-level abort cleanup. Leave pgfdw_reject_incomplete_xact_state_change() to mark the connection unusable for further SQL (it still raises the ERROR that aborts the subtransaction), but don't free the PGconn until pgfdw_xact_callback() runs at top-level abort, by which point all scan state referencing it is torn down via the normal executor shutdown paths.
Fujita (the postgres_fdw committer and original author of much of this machinery) chose (B) with two justifications:
- Back-patch surface. The fix needs to land on all supported branches. Option (A) would touch struct layouts and every use site of the cached
PGconn *across multiple files — risky for stable branches. - Option (A) gives no user-visible benefit over (B). With (B), any code path that wrongly tries to use the connection after the subtransaction abort would end up dereferencing a live-but-unusable
PGconnand fail cleanly atpgfdw_report_internal()withno connection to the server. With (A), the same path fails with essentially the same error. So (A) is "a lot of effort with little result."
Matheus independently arrived at the same conclusion after demonstrating that under patch (B), spurious use of the cached connection would surface as a clean ERROR: 08006: no connection to the server rather than a crash — which is the actual safety property we need.
What the patch changes mechanically
The patch removes the disconnect_pg_server() call from pgfdw_reject_incomplete_xact_state_change() (or equivalently guards against freeing there) and relies on pgfdw_xact_callback() / pgfdw_reset_xact_state() at top-transaction end to perform the disconnect. Michael Paquier explicitly audited this: "Looking at the locations of pgfdw_reset_xact_state(), it looks like you are getting yourself covered" — i.e., every top-transaction-end path already funnels through a location that will discard the failed connection, so nothing leaks.
The changing_xact_state = true sentinel continues to prevent any further SQL from being issued on the connection during the remainder of the top transaction; it's just that the PGconn struct stays allocated so dangling copies remain dereferenceable until the executor node that holds them is itself shut down.
Testing
Paquier suggested an isolation test modeled on temp-schema-cleanup, which uses pg_terminate_backend() to deterministically order backend death relative to other operations. Fujita opted for a simpler SQL-level regression test that polls pg_stat_activity to confirm the remote backend actually exited before proceeding — this avoids the complexity of the isolation framework while still ordering the critical events. Paquier accepted this and only requested a clearer comment ("test cleanup of failed connections on abort").
Broader Observations
- The bug is latent since at least v14 (Matheus confirmed reproducibility there) — it has survived because the preconditions (cursor on foreign table + savepoint + remote termination + further subtransaction work) are uncommon in practice.
- Fujita notes an orthogonal UX wart: a transaction whose subtransaction aborted due to remote connection loss cannot commit (remote state is lost), yet the local transaction keeps accepting commands. He explicitly defers this to a separate discussion; it would require either promoting such errors to top-level aborts or flagging the transaction as commit-forbidden.
- The fix embodies a general PostgreSQL pattern: don't free resources whose lifetime is managed by a higher-level cleanup callback earlier than that callback runs, even when you "know" they're no longer usable. Mark-and-sweep at transaction boundaries is the idiomatic approach; opportunistic early freeing almost always creates dangling-pointer hazards for any state that cached the resource.