Technical Analysis: refint Plan Cache Bug and Resolution
The Core Problem
The contrib/spi/refint module provides check_foreign_key() and check_primary_key() trigger functions that implement referential integrity constraints in pure PL/C using SPI. These predate PostgreSQL's built-in foreign key mechanism and are now maintained primarily as example code for the SPI API.
The bug centers on check_foreign_key()'s private plan cache. When a cascade UPDATE trigger fires, the function constructs an UPDATE query for the referencing table where the SET clause embeds the literal NEW key values from the current trigger invocation. For example, if updating a referenced key from 1→2, it generates something like:
UPDATE referencing_table SET fk_col = 2 WHERE fk_col = $1
The value 2 is hard-coded into the query text, while only the OLD key value is parameterized (as $1). This prepared plan is then cached, keyed by the trigger name.
When a second cascade UPDATE fires through the same trigger (e.g., updating another row in the referenced table from 3→4), the cached plan is retrieved and reused. But the cached plan still contains SET fk_col = 2 from the first invocation—so the referencing row that should become 4 instead becomes 2.
Why Parameterization Doesn't Work
The natural fix would be to parameterize the SET values too. However, as Nathan Bossart noted in the related BUG #19476 thread, the referenced table's key column type is not necessarily assignment-compatible with the referencing table's column type in the SET target. The types might differ, making a generic parameterized approach type-unsafe.
Secondary Problem: No Cache Invalidation
The plan cache is keyed by trigger name (a string), not by trigger OID or any versioned identifier. If a trigger is dropped and recreated with the same name but different parameters (different number of referencing tables, different columns), the stale cached plans cause errors like:
ERROR: t: check_foreign_key: # of plans changed in meantime
Or worse, silently use wrong plans if the number of references happens to match. Using trigger OID as the key would help but doesn't handle OID wraparound/reuse.
Evolution of the Fix
v1: Skip caching for cascade UPDATE only
The initial patch preserved the plan cache for restrict, cascade DELETE, and setnull actions, but skipped caching for cascade UPDATE. This addressed the immediate embedding bug without disrupting other action types.
v2: Remove cache from check_foreign_key() entirely
After Nathan demonstrated the drop/recreate invalidation problem, Ayush broadened the fix to remove the entire plan cache from check_foreign_key(). Plans are prepared fresh each invocation and released by SPI_finish(). check_primary_key() was left with its cache since its generated SELECT doesn't embed row values.
v3: Remove cache from both functions
Nathan argued there's no principled reason to keep the cache in check_primary_key() either—it's subject to the same invalidation bugs. v3 removes the private plan cache from the entire refint module. Both functions now do SPI_prepare per invocation.
v4: Final cleanup
Per Nathan's review feedback:
- Dropped regression test additions (module may be removed soon)
- Removed historical comments about the cache
- Moved local variable declarations to function top
- Separated pgindent reformatting into a second patch
Architectural Significance
This is a textbook example of why per-trigger plan caches without invalidation are dangerous. PostgreSQL's built-in plan cache (plancache.c) handles invalidation via the sinval mechanism—when catalog changes occur, cached plans are marked invalid and re-prepared on next use. The refint module's hand-rolled cache in a static EPlan array has none of this infrastructure.
The performance cost of removing the cache (one extra SPI_prepare per trigger invocation) is negligible for example/legacy code. The simplicity gain—eliminating an entire class of bugs—is substantial.
The Larger Question: Module Removal
Álvaro Herrera raised whether contrib/spi/refint should simply be removed from the tree entirely. The module's documentation already acknowledges it's superseded by built-in foreign keys. The consensus is:
- Remove in v20 (when the development cycle opens)
- Fix egregious bugs now since the module will remain supported in existing major versions for several years
- A separate thread will be started to propose removal, giving the community a chance to object
Design Tradeoffs
| Approach | Pros | Cons |
|---|---|---|
| Parameterize SET values | Preserves caching | Type-unsafe across different column types |
| Skip cache for UPDATE only | Minimal change | Leaves invalidation bugs for other actions |
| Remove cache entirely | Eliminates all cache bugs | Slight per-call overhead |
| Remove module | Eliminates all bugs | Breaks existing users; needs deprecation cycle |
The chosen approach (remove cache entirely + plan module removal for v20) is the most pragmatic balance of correctness, simplicity, and backward compatibility.