=?UTF-8?B?T3B0aW1pemUgUmVsZmlsZW51bWJlck1hcEludmFsaWRhdGVDYWxsYmFjayBmb3IgbG9naWNh?= =?UTF-8?B?bCBkZWNvZGluZyBwZXJmb3JtYW5jZQ==?=

First seen: 2026-06-03 08:13:44+00:00 · Messages: 1 · Participants: 1

Latest Update

2026-06-04 · claude-opus-4-6

Optimize RelfilenumberMapInvalidateCallback for Logical Decoding Performance

Core Problem

The thread addresses a significant performance bottleneck in PostgreSQL's logical decoding (walsender) pathway, specifically in the RelfilenumberMapInvalidateCallback function. This function is part of the relfilenumber map cache infrastructure — the mapping between relation OIDs and their physical file numbers (formerly relfilenode numbers, renamed in PG16+).

Architectural Context

PostgreSQL's logical decoding must process catalog invalidation messages embedded in WAL to maintain a consistent snapshot of the system catalog as it replays changes. The RelfilenumberMapInvalidateCallback is invoked whenever an invalidation message indicates that a relation's physical storage mapping has changed (e.g., due to DDL like ALTER TABLE, TRUNCATE, ANALYZE with statistics changes, etc.).

The relfilenumber map cache (RelfilenumberMapHash) is a hash table keyed by RelfilenumberMapKey (which contains the relfilenumber and the tablespace OID) and maps to the relation OID. This is used during logical decoding to resolve physical file references back to logical relations.

The Scaling Problem

The current implementation has a critical algorithmic deficiency:

When invalidating a SPECIFIC relation (relid != InvalidOid):
  - The code performs a FULL sequential scan of the entire hash table
  - It must iterate all entries to find those matching the given relid
  - This is O(n) where n = number of cached entries

This is because the hash table is keyed by (relfilenumber, tablespace)relid, but invalidation messages arrive with a relid. There is no reverse index from relid back to the hash entry.

In multi-tenant deployments or systems with many tables (common in SaaS/PaaS scenarios), the cache can grow to tens of thousands of entries. A single DDL statement can generate multiple invalidation messages, and each ALTER TABLE in a migration touching many tables compounds the problem quadratically: N tables × O(N) per invalidation = O(N²) total cost.

Why This Matters for Logical Replication

Logical decoding/replication is increasingly critical infrastructure for:

Walsender throughput directly impacts replication lag. When DDL-heavy workloads (schema migrations, ANALYZE runs on many tables) cause the walsender to stall processing invalidations, replication lag spikes can cascade into production issues. The benchmarks show this can reach 60+ seconds for 50,000 tables with separate-transaction DDL — an unacceptable stall for a replication stream.

Proposed Solution

The patch introduces two supplementary data structures to enable O(1) invalidation:

1. Reverse Hash Table (RelfilenumberReverseHash)

A new hash table mapping relid → RelfilenumberMapKey. This provides O(1) lookup when an invalidation message specifies a particular relation:

This is a classic space-time tradeoff — doubling the memory for the map cache in exchange for eliminating the O(n) scan.

2. Singly-Linked List (NegativeEntryList)

Negative cache entries (where relid == InvalidOid) represent "known-missing" mappings — relfilenumber values that have been looked up but don't correspond to any known relation. These cannot be indexed in the reverse hash (since their relid is Invalid/0), so they are tracked in a separate linked list.

When a "wildcard" invalidation occurs (invalidating all entries), both the hash tables and this list are cleared. When a specific relid is invalidated, negative entries are unaffected (they don't match any specific relid anyway).

Algorithmic Complexity Improvement

Operation Before After
Invalidate specific relid O(n) scan O(1) hash lookup + delete
Invalidate all (relid=0) O(n) scan O(n) — unchanged, must clear all
Insert new mapping O(1) O(1) — two hash inserts
Memory overhead 1 hash table 2 hash tables + linked list

Performance Analysis

The benchmark results are compelling and demonstrate the expected quadratic-to-linear improvement:

Separate transactions (50K tables): 61.8s → 4.1s (15.2× speedup)

Single transaction (50K tables): 29.2s → 0.34s (85.2× speedup)

The fact that DML-only performance is unchanged confirms the patch has no regression for the common case (DML-only workloads don't trigger relfilenumber invalidations frequently).

Design Considerations and Potential Discussion Points

Memory Overhead

The reverse hash table roughly doubles the memory consumption of the relfilenumber map cache. For 50K tables, this might be ~4-8 MB additional memory per walsender process. Given that this cache only exists in logical decoding contexts and the memory is proportional to active relations, this is likely acceptable.

Maintenance Burden

Two synchronized data structures (forward and reverse hash) plus a linked list introduce more code paths that must remain consistent. Any bug that desynchronizes them would cause cache corruption — entries that can't be invalidated or phantom invalidations.

Alternative Approaches Not Discussed (Yet)

Negative Entry Handling

The linked list for negative entries is simpler than a hash but has O(n) traversal for full invalidation. This is acceptable because negative entries should be rare in production (they represent failed lookups) and full invalidation already requires clearing the main hash table anyway.

Status

This is an initial submission (v1 patch) with no responses yet. Given the clear performance improvement and the relatively contained scope of changes (localized to the relfilenumber map cache in src/backend/utils/cache/ area), this patch has good potential for acceptance if:

  1. The implementation is clean and handles edge cases (concurrent access, memory contexts)
  2. The additional memory overhead is deemed acceptable
  3. No regressions are found in the regression test suite or stress testing
  4. The code is well-documented explaining the dual-hash-table design

The patch targets a real-world pain point that is increasingly relevant as PostgreSQL adoption grows in multi-tenant/microservice architectures with large numbers of tables.