Avoid orphaned objects dependencies, take 3

First seen: 2024-04-22 08:45:19+00:00 · Messages: 74 · Participants: 9

Latest Update

2026-06-04 · claude-opus-4-6

Incremental Update: TOCTOU Patch v2 Arrives, Jeff Davis Raises Spurious Error Concern

New Development: Bertrand's TOCTOU Protection Patch (0002 replacement)

With patch 0001 committed and backpatched, Bertrand has resumed work on the next layer of protection: closing the TOCTOU (Time-of-Check-Time-of-Use) window where a REVOKE could land between the original permission check and dependency recording. This was Robert Haas's longstanding concern.

New Architectural Approach: SharedInvalidMessageCounter-based Tracking

The new patch borrows the invalidation-counter technique from RangeVarGetRelidExtended():

  1. At ACL check time: Record the current SharedInvalidMessageCounter value alongside the checked object.
  2. Before locking (in dependency recording): Compare current counter to saved value. If changed, recheck permissions before acquiring the lock.
  3. After lock wait: If more invalidations arrived during the wait, release and retry.

This is fundamentally different from the earlier v20/v21 approach of pre-locking at ~70 call sites. Instead of pushing lock acquisition earlier, it detects when the world changed between check and use.

Key Implementation Details

Alternatives Explicitly Discarded

  1. Never-free array in TopMemoryContext — rejected because high-water mark persists for backend lifetime.
  2. Passing privilege info (roleId, mode) through dependency recording APIs — rejected because expression-based dependencies (recordDependencyOnExpr, find_expr_references_walker) discover objects by walking expression trees; callers never see individual objects to attach privilege info.

Jeff Davis's Critique: Spurious Errors from Incomplete Coordination

Jeff Davis raised a significant concern: RangeVarGetRelidExtended() coordinates three things (name lookup + lock + ACL check), but the new recheckAclAndLock() only coordinates lock + ACL check. This means name resolution isn't re-evaluated, leading to spurious errors.

His example demonstrates the problem concretely:

The function would have been created in s1 on retry — the error is technically correct but semantically misleading, as no actually referenced schema was dropped. Jeff asks whether this is expected/acceptable behavior.

This is an important design question: should the system re-resolve names (full retry) or accept occasional false-positive errors from the simpler two-way coordination?

History (2 prior analyses)
2026-06-01 · claude-opus-4-6

Monthly Summary: Avoid Orphaned Object Dependencies, Take 3 — May 2026

Overview

This month saw significant progress toward landing the patch that prevents orphaned object dependencies in PostgreSQL's catalog. The core problem — race conditions between concurrent DDL operations causing pg_depend entries to reference non-existent objects — moved from extended design debate to active commit preparation, with committer Heikki Linnakangas stepping in to refactor and prepare patch 0001 for independent landing.

Background & Problem

PostgreSQL's dependency tracking system can develop orphaned entries when concurrent DDL races (e.g., CREATE FUNCTION in one session while DROP SCHEMA executes in another). Neither operation sees the other's uncommitted work, resulting in catalog entries pointing to non-existent objects. This causes pg_dump failures, segfaults in catalog functions, and silent integrity corruption.

The fix adds AccessShareLock acquisition on referenced objects during dependency recording (recordMultipleDependencies() and changeDependencyFor()), which conflicts with the AccessExclusiveLock taken by DROP operations, serializing them properly.

Key Developments This Month

v21 Patch: Addressing Robert Haas's Open Concerns

Bertrand Drouvot posted v21, resolving two concerns from Robert's review of v20:

  1. Not every dependency needs lock + permission check. Bertrand demonstrated with CREATE VIEW v1 AS SELECT myfunc(a) FROM t1 — a dependency on myfunc() is recorded without requiring EXECUTE privilege (deferred to query time). This proves blanket permission-check requirements would be overly restrictive.

  2. Assert mechanism cannot be confused by nested DDL. Robert worried that ProcessUtility executing UDFs containing DDL could confuse the tracking. Bertrand showed false positives are impossible (locks are transaction-scoped) and false negatives require an unrealistic coincidence of conditions.

Additionally, v21 fixed a regression triggered by concurrent commit 4793fc41f82, which introduced a new P1 pattern caught by the assert infrastructure — validating its utility.

Heikki Linnakangas Takes Over 0001

The most significant development: Heikki began actively refactoring patch 0001 with intent to commit it independently. Key changes:

  • Renamed LockNotPinnedObject()dependencyLockAndCheckObject() — aligning with the existing shdepLockAndCheckObject() precedent. This naming highlights that PostgreSQL already does lock-and-check for shared dependencies; the patch extends it to non-shared ones.
  • Fixed a bug in the relation codepath — The original patch acquired AccessShareLock on relations but didn't re-check existence afterward, defeating the purpose. Heikki added SearchSysCacheExists1(RELOID, ...) after LockRelationOid().
  • Removed ObjectByIdExist() utility function — Too special-purpose to expose as general API; logic inlined.
  • Improved error messages — Now reads "dependent relation was concurrently dropped", matching existing patterns and avoiding OIDs for test stability.
  • Added role-dependency test — Covering the existing shdepLockAndCheckObject() path.

Strategic Decision: Incremental Commit Approach

The decision to land 0001 independently is architecturally significant:

  • 0001 (core lock-during-dependency-recording): being prepared for commit now
  • 0002 (pre-lock before permission checks at ~70 call sites): will follow separately
  • 0003 (assert infrastructure for regression detection): will follow separately

This reduces review burden and risk while delivering the primary protection immediately.

Lock Overhead Assessment

Testing confirmed manageable overhead: a table with 15 UDT columns goes from 7 to 23 locks. With max_locks_per_transaction defaulting to 64, exhaustion requires ~45 distinct UDTs in a single table — unrealistic in practice. All new locks are AccessShareLock (weakest mode).

Current Status

Patch 0001 is in final review/polish with Heikki actively preparing it for commit. Minor items remain (unnecessary #include, pgindent formatting, whether the role-dependency test should be a preparatory commit). The core design debate is resolved — locking happens inside the dependency recording code with appropriate optimizations to skip when callers already hold conflicting locks.


2026-06-01 · claude-opus-4-6

Incremental Update: Patch 0001 COMMITTED and BACKPATCHED

Major Milestone: Core Fix Lands in PostgreSQL

Heikki Linnakangas has committed and backpatched patch 0001 — the core lock-during-dependency-recording mechanism. This is the most significant event in this thread's multi-year history: the fundamental protection against orphaned object dependencies is now in the codebase.

Key Technical Details of the Final Commit

  1. Backpatched to version 14 — This is a bug fix, not a feature, so it's going into all supported branches. Version 14 required additional work because it lacked the IsPinnedObject() function that newer branches have.

  2. Fixed a pre-existing bug exposed by the patchrecordDependencyOn() was being called with InvalidOid during ALTER TABLE commands on tables with dropped columns. This was "bogus but harmless" before the patch (no lock attempt on InvalidOid), but the new lock-and-check logic would fail on it. Heikki fixed this as part of the commit.

  3. Bertrand's minor feedback incorporated — The unnecessary #include "catalog/index.h" and pgindent formatting issues were fixed, along with additional copy-editing on comments.

  4. Role dependency test included in same commit — Heikki decided against splitting the role-dependency test into a separate preparatory commit, judging it "not really any better or worse" than including it together.

What Remains

Patches 0002 (pre-lock before object_aclcheck() calls to close TOCTOU window) and 0003 (assert infrastructure to detect future P1 regressions) are still pending review and commit. However, the most critical protection — preventing orphaned dependencies from concurrent CREATE/DROP races — is now active.