Avoid Orphaned Object Dependencies in PostgreSQL: Deep Technical Analysis
The Core Problem
PostgreSQL's dependency tracking system (pg_depend) can develop orphaned entries when concurrent DDL operations race against each other. The most common manifestation is functions linked to non-existent namespaces, but the problem extends to any dependency relationship: functions referencing dropped types, tables referencing dropped types, domains referencing other domains, etc.
Race Condition Mechanics
Scenario 1 (DROP before CREATE commits):
Session 1: BEGIN; DROP SCHEMA schem;
Session 2: CREATE FUNCTION schem.foo() ... -- resolves schema OID, proceeds
Session 1: COMMIT; -- schema gone
-- Result: function exists with pronamespace pointing to non-existent schema
Scenario 2 (CREATE before DROP commits):
Session 1: BEGIN; CREATE FUNCTION schem.foo() ...
Session 2: DROP SCHEMA schem CASCADE; -- doesn't see session 1's function
Session 1: COMMIT;
-- Result: function exists with pronamespace pointing to non-existent schema
The fundamental issue is that dependency recording in recordMultipleDependencies() does not acquire any lock on the referenced object. Since DROP acquires AccessExclusiveLock via AcquireDeletionLock(), an AccessShareLock on the referenced object during dependency recording would create the necessary conflict to serialize these operations.
Why This Matters Architecturally
Orphaned dependencies cause:
pg_dumpfailures (cannot resolve object references)- Segmentation faults when catalog functions try to describe non-existent objects
- Silent data integrity corruption in the catalog
- Inability to cleanly drop or alter affected objects
This is not merely theoretical — multiple production fleets report non-negligible occurrences of this issue.
Evolution of Proposed Solutions
Approach 1: Per-case fixes (rejected)
Initially proposed fixes targeted specific cases (function-to-namespace). Tom Lane pointed out there are "dozens more cases" requiring a global approach.
Approach 2: Dirty snapshot during dependency recording (rejected)
Using a dirty snapshot in recordMultipleDependencies() to detect uncommitted deletions. Deemed "scary" and still failed to close all race conditions because it couldn't detect objects dropped before the snapshot was taken.
Approach 3: Lock on referenced objects during dependency recording (adopted direction)
Tom Lane's proposal: "creation DDL will have to take a lock on each referenced object that'd conflict with a lock taken by DROP." This became the foundation for all subsequent patch versions.
Key Design Debate: WHERE to Acquire the Lock
The central architectural disagreement in this thread is about the appropriate code layer for lock acquisition:
Position A: Lock inside dependency code (v1–v8, final v20)
- Advantage: Cannot miss any case; single enforcement point
- Disadvantage: Low-level code taking locks is surprising; harder to reason about lock ordering
Position B: Lock at caller sites (v9–v19)
- Advantage: Explicit about which operations take which locks; easier to audit
- Disadvantage: ~250 call sites to modify; easy to miss cases; maintenance burden
Robert Haas pushed strongly for Position B, leading to versions 9–19 that moved locking to upper layers. However, Jeff Davis's intervention (May 2025) argued that expecting all callers to get it right is more error-prone than centralizing it, and proposed a compromise with a LOCKMODE parameter to recordDependencyOn().
Final Resolution (v20)
The final approach (v20) returns to locking inside the dependency code but addresses Robert's concerns through a three-patch structure:
- 0001: Lock acquisition in
recordMultipleDependencies()andchangeDependencyFor(), with optimization to skip if caller already holds a conflicting lock - 0002: Pre-lock referenced objects before
object_aclcheck()calls at ~70 call sites to close TOCTOU window - 0003: Assert-based instrumentation to detect future regressions where permission checks precede lock acquisition
Critical Technical Details
The RelationRelationId Special Case
Relations use a different lock tag format than other database objects (via LockRelationOid() vs LockDatabaseObject()). The patch must handle this: when the referenced object is a relation, use CheckRelationOidLockedByMe() to check existing locks, and LockRelationOid() to acquire them.
Relations are typically already protected by:
- Direct locks from
table_open()/relation_open() - Indirect protection (index protected by lock on its table)
The SNAPSHOT_SELF / CommandCounterIncrement Issue
When creating a relation, PostgreSQL implicitly creates a composite type. When recording the relation's dependency on this type, the type may not yet be visible through a normal snapshot. Solutions evolved:
- v5–v7: Dirty snapshot to see uncommitted same-transaction work
- v8:
SnapshotSelf(cleaner, only sees own transaction's uncommitted work) - v9+:
CommandCounterIncrement()to make new entries visible normally
The TOCTOU Permission Window (P1 vs P2)
Robert raised the concern that locking inside recordMultipleDependencies() creates a window where:
- P1 (permission check before lock): User passes permission check, then concurrent REVOKE succeeds before lock acquisition, allowing unauthorized catalog modification
- P2 (lock before permission check): User acquires lock on object they may not have permission to touch
Robert argued P1 is the security-critical pattern (CVE-worthy), while P2 merely permits temporary interference. The v20 patch addresses P1 by adding explicit lock calls before permission checks at the relevant call sites.
Lock Count Overhead
Testing showed that a table with 15 UDT columns goes from 7 to 23 locks with the patch. Robert dismissed this concern: the default max_locks_per_transaction of 64 would require ~45 distinct UDTs in a single table to exhaust, which is unrealistic. The locks are AccessShareLock (weakest mode) and only conflict with AccessExclusiveLock.
changeDependencyFor() Coverage
Alexander Lakhin's testing revealed that v4 missed the changeDependencyFor() code path, used by ALTER ... SET SCHEMA operations. This was fixed in v5 by adding lock-and-check to that function as well.
Open Questions (as of last message)
Robert's final response raises two concerns about the assert mechanism in 0003:
- Should we insist on lock + permission check for every dependency, not just those that happen to have an existing
object_aclcheck()call? - The mechanism may be confused by
ProcessUtilityexecuting user-defined functions that themselves run DDL within the same statement.
Technical Complexity Assessment
The patch is inherently complex because:
- The dependency system is a cross-cutting concern touching virtually all DDL
- Lock ordering must be carefully considered to avoid deadlocks
- The distinction between "object we're creating" vs "object we're referencing" requires different treatment
- Some objects (pinned objects like built-in types) don't need locking
- The interaction with concurrent transactions requires careful snapshot management