Technical Analysis: pg_stat_lock Blocker Mode Dimension
Core Problem
PostgreSQL's pg_stat_lock view (introduced in recent versions to provide cumulative lock wait statistics) currently aggregates lock contention data only by locktype. This means operators can see that lock waits are occurring on relations, but cannot distinguish what kind of operation is causing those waits without resorting to parsing log_lock_waits output or deploying external sampling extensions like pg_wait_sampling.
The fundamental architectural gap is that lock contention diagnostics lack a critical dimension: the lock mode of the blocker. In production, the difference between waits caused by ShareUpdateExclusiveLock (VACUUM) and AccessExclusiveLock (DDL) is operationally crucial — the former suggests autovacuum tuning issues while the latter points to migration/deployment problems. Today this requires correlating logs or sampling pg_locks in real-time, both of which are lossy and operationally expensive.
Proposed Solution
Architecture
The patch adds a mode column to pg_stat_lock, expanding the statistics aggregation key from [locktype] to [locktype, mode]. This means PgStatShared_Lock structures in shared memory are expanded to cover the cross-product of lock types and lock modes.
Blocker Mode Capture Algorithm
The mode is determined at the point where a lock requester joins the wait queue, under the lock partition LWLock (so no additional locking is required):
- Primary rule: Among all modes
mwherelock->granted[m] > 0that conflict with the requester's mode, select the strongest (highest-numbered in the lock mode ordering). - Fallback rule: If no held mode conflicts (a pure queue-priority wait), select the strongest mode in
lock->waitMaskthat conflicts.
This is architecturally sound because:
- The lock partition LWLock is already held at this point in
ProcSleep()/ the wait-queue insertion path lock->granted[]andlock->waitMaskare already maintained and available- The "strongest conflicting" heuristic captures the mode whose release is necessary for the waiter to proceed
Shared Memory Cost
The expansion adds approximately 2.3 kB per cluster — one additional dimension of ~16 lock modes across the existing lock type array. This is negligible.
Fast Path Handling
Critically, no new instrumentation is added to the fast-path lock acquisition code. The blocker-mode snapshot logic runs only when a request would otherwise wait, meaning there is zero overhead on the uncontended hot path. This is an important design constraint that preserves the performance characteristics of fast-path locking.
Key Design Decisions and Tradeoffs
1. Blocker Mode vs. Requester Mode vs. Both
The author explicitly considered three alternatives:
- Requester mode only: Simpler but less operationally useful — you can often infer what the requester was doing from context
- Both modes: Most informative but explodes the view's row count (modes × modes × locktypes) and likely overlaps with
pg_wait_samplinguse cases - Blocker mode only (chosen): Answers the operational question "what is causing contention" directly
This is a pragmatic middle ground. The blocker mode is the actionable information — knowing that VACUUM is blocking your workload tells you to tune autovacuum, while knowing DDL is blocking tells you to fix your migration strategy.
2. Dual Semantics of the Mode Column
The most architecturally awkward aspect is that the mode column has different semantics depending on which counter is being examined:
- For
waits/wait_time: mode = the blocker's lock mode - For
fastpath_exceeded: mode = the requester's lock mode (because slot exhaustion has no blocker)
The author acknowledges this tension and proposes documenting it rather than splitting views or NULLing values. The column is deliberately named mode (not blocker_mode) to accommodate this dual use. This is a defensible choice — splitting into separate views would complicate monitoring queries, and NULLing loses useful per-mode breakdown of fast-path exhaustion.
3. Chained Wait Attribution
The open question about chained waits reveals a fundamental limitation of per-event attribution in any cumulative statistics system:
TX1 holds AccessShareLock (long SELECT)
TX2 requests AccessExclusiveLock → blocked by TX1 (attributed to AccessShareLock)
TX3 requests AccessShareLock → blocked by TX2 (attributed to AccessExclusiveLock)
TX3's wait is proximately caused by TX2's AccessExclusiveLock (which causes queue-priority blocking), but ultimately caused by TX1's long SELECT. The patch correctly attributes to the proximate blocker, which is:
- Consistent with how
pg_stat_lockalready works (per-waiter attribution) - The only option that doesn't require expensive transitive-closure computation under the partition LWLock
- Individually accurate for each waiter's experience
Walking the full blocker chain would require either holding multiple partition LWLocks or accepting stale data, both unacceptable for a statistics increment path.
Relationship to Existing Infrastructure
The patch leverages GetLockHoldersAndWaiters(), which already computes holder modes for log_lock_waits. This means the core logic is already battle-tested in production — the patch is essentially promoting information that's already computed in one code path into a persistent statistics aggregation.
The implementation sits at the intersection of:
- Lock manager (
lock.c,proc.c): Where the blocker mode is determined - Cumulative statistics system (
pgstat_lock.c): Where the aggregation occurs - System views (
pg_stat_lock): Where results are exposed
Potential Concerns for Review
- Catalog version bump: Adding a column to a system view requires a catversion bump
- pg_stat_reset() behavior: The expanded statistics keys need proper reset handling
- Backward compatibility: Monitoring tools querying
pg_stat_lockwill see schema changes - "Strongest mode" heuristic correctness: When multiple conflicting modes are held simultaneously, "strongest" may not always correspond to the last one to be released — but it's a reasonable approximation without tracking per-holder grant order
- Statistics naming: Whether
modeadequately communicates the dual semantic without confusing users who expect it to always mean "blocker mode"