Technical Analysis: Fix libxml Leaks in contrib/xml2 XPath Functions
Core Problem
The contrib/xml2 module contains two memory leak bugs on successful execution paths in its XPath evaluation functions. These are not error-handling gaps but fundamental ownership-model violations when interfacing with libxml2's memory management.
The libxml2 Memory Model
libxml2 uses its own allocator (defaulting to malloc/free but overridable). Objects allocated by libxml2 functions must be freed using libxml2's deallocation functions (xmlFree(), xmlXPathFreeObject(), etc.) — they are not managed by PostgreSQL's memory context system. This means:
- They are invisible to PostgreSQL's palloc-based memory management
- They survive memory context resets
- They accumulate across repeated function calls within a single backend session
Leak #1: xpath_list() — String Ownership Confusion
In xpath_list(), the code calls xmlXPathCastNodeToString() which returns a newly-allocated xmlChar *. This pointer is passed directly to xmlBufferWriteCHAR(), which copies the string into the buffer rather than taking ownership. The original xmlChar * is never freed.
This is a classic ownership-model misunderstanding: the caller assumes the callee consumes the allocation, but the callee only reads it. Every XPath node result in xpath_list() leaks one libxml string allocation.
Leak #2: xpath_table() — Two Distinct Leak Sites
The xpath_table() function has two problems:
-
XPath result objects:
xmlXPathCompiledEval()returns anxmlXPathObjectPtrthat must be freed withxmlXPathFreeObject(). The function was not doing this after each column evaluation, leaking the entire XPath result structure per column per row. -
String values in the values array: The function stores
xmlChar *strings (libxml-allocated) in the values array that feedsBuildTupleFromCStrings(). SinceBuildTupleFromCStrings()copies/converts these strings into the result tuple (using palloc'd memory), the original libxml strings become garbage that was never freed.
Architectural Significance
Why This Matters for Long-Running Backends
PostgreSQL backends are long-lived processes. Memory leaked outside the palloc system accumulates indefinitely — it cannot be reclaimed by memory context resets, transaction boundaries, or even explicit RESET commands. The only remedy is backend termination.
The reproduction scripts demonstrate steady ~1.3-2 MB/iteration growth in backend RSS for xpath_table and similar growth for xpath_list. In production scenarios with connection pooling (where backends serve thousands of queries), this creates unbounded memory growth leading to OOM conditions.
Relationship to Recent Error-Handling Work
The author correctly identifies that commit 732061150b0 (from BUG #18943) addressed error-path cleanup — ensuring libxml resources are freed when exceptions occur. These patches address the complementary problem: resources that leak on the success path. Both are necessary for correct resource management.
Proposed Solution
Patch 0001: xpath_list Fix
The fix is straightforward: capture the return value of xmlXPathCastNodeToString() in a local variable, pass it to xmlBufferWriteCHAR(), then call xmlFree() on it. This is a minimal, low-risk change.
Patch 0002: xpath_table Fix
This patch is more involved:
-
Free XPath result objects immediately after extraction: After each
xmlXPathCompiledEval()call, once the needed value is extracted from the result, callxmlXPathFreeObject(). -
Free per-column strings after tuple construction: After
BuildTupleFromCStrings()has consumed the values array, iterate through andxmlFree()any libxml-allocated strings. -
Track allocations across PG_TRY blocks: The existing error-handling block is extended to also free any in-flight libxml allocations if an error occurs mid-evaluation. This prevents the leaks from simply moving to the error path.
Design Considerations
Back-patching Suitability
The author argues these are candidates for back-patching to all supported branches. This is well-justified because:
- The leaks are long-standing (present since the functions were written)
- The fixes are minimal and low-risk
- Memory leaks in production backends are operationally significant
- The changes don't alter any external behavior or API
Why These Weren't Caught Earlier
contrib/xml2 is a legacy module (predating the core xml type's XPath support). It receives less attention than core code, and memory leaks from foreign allocators are invisible to PostgreSQL's memory accounting infrastructure and tools like MemoryContextStats().