Incremental Update: TOCTOU Patch v2 Arrives, Jeff Davis Raises Spurious Error Concern
New Development: Bertrand's TOCTOU Protection Patch (0002 replacement)
With patch 0001 committed and backpatched, Bertrand has resumed work on the next layer of protection: closing the TOCTOU (Time-of-Check-Time-of-Use) window where a REVOKE could land between the original permission check and dependency recording. This was Robert Haas's longstanding concern.
New Architectural Approach: SharedInvalidMessageCounter-based Tracking
The new patch borrows the invalidation-counter technique from RangeVarGetRelidExtended():
- At ACL check time: Record the current
SharedInvalidMessageCountervalue alongside the checked object. - Before locking (in dependency recording): Compare current counter to saved value. If changed, recheck permissions before acquiring the lock.
- After lock wait: If more invalidations arrived during the wait, release and retry.
This is fundamentally different from the earlier v20/v21 approach of pre-locking at ~70 call sites. Instead of pushing lock acquisition earlier, it detects when the world changed between check and use.
Key Implementation Details
- Tracking array stored in a dedicated
AclCheckTrackContextmemory context (child ofTopMemoryContext), reset at the start of each top-level utility statement. - Gated by
aclcheck_tracking_activeflag — only active during top-level utility statement execution, so DML/queries pay zero cost. - Flag cleared in both normal completion and
AbortTransaction()for error-path safety.
Alternatives Explicitly Discarded
- Never-free array in TopMemoryContext — rejected because high-water mark persists for backend lifetime.
- Passing privilege info (roleId, mode) through dependency recording APIs — rejected because expression-based dependencies (
recordDependencyOnExpr,find_expr_references_walker) discover objects by walking expression trees; callers never see individual objects to attach privilege info.
Jeff Davis's Critique: Spurious Errors from Incomplete Coordination
Jeff Davis raised a significant concern: RangeVarGetRelidExtended() coordinates three things (name lookup + lock + ACL check), but the new recheckAclAndLock() only coordinates lock + ACL check. This means name resolution isn't re-evaluated, leading to spurious errors.
His example demonstrates the problem concretely:
- Session 2 has
search_path=s2, s1and creates a function - Session 1 concurrently drops schema
s2 - Session 2 gets
ERROR: referenced schema was concurrently droppedeven thoughs1(the schema that would actually be used after retry) was never dropped
The function would have been created in s1 on retry — the error is technically correct but semantically misleading, as no actually referenced schema was dropped. Jeff asks whether this is expected/acceptable behavior.
This is an important design question: should the system re-resolve names (full retry) or accept occasional false-positive errors from the simpler two-way coordination?