[PATCH] Fix libxml leaks in contrib/xml2 XPath functions

First seen: 2026-05-31 22:01:24+00:00 · Messages: 1 · Participants: 1

Latest Update

2026-06-01 · claude-opus-4-6

Technical Analysis: Fix libxml Leaks in contrib/xml2 XPath Functions

Core Problem

The contrib/xml2 module contains two memory leak bugs on successful execution paths in its XPath evaluation functions. These are not error-handling gaps but fundamental ownership-model violations when interfacing with libxml2's memory management.

The libxml2 Memory Model

libxml2 uses its own allocator (defaulting to malloc/free but overridable). Objects allocated by libxml2 functions must be freed using libxml2's deallocation functions (xmlFree(), xmlXPathFreeObject(), etc.) — they are not managed by PostgreSQL's memory context system. This means:

  1. They are invisible to PostgreSQL's palloc-based memory management
  2. They survive memory context resets
  3. They accumulate across repeated function calls within a single backend session

Leak #1: xpath_list() — String Ownership Confusion

In xpath_list(), the code calls xmlXPathCastNodeToString() which returns a newly-allocated xmlChar *. This pointer is passed directly to xmlBufferWriteCHAR(), which copies the string into the buffer rather than taking ownership. The original xmlChar * is never freed.

This is a classic ownership-model misunderstanding: the caller assumes the callee consumes the allocation, but the callee only reads it. Every XPath node result in xpath_list() leaks one libxml string allocation.

Leak #2: xpath_table() — Two Distinct Leak Sites

The xpath_table() function has two problems:

  1. XPath result objects: xmlXPathCompiledEval() returns an xmlXPathObjectPtr that must be freed with xmlXPathFreeObject(). The function was not doing this after each column evaluation, leaking the entire XPath result structure per column per row.

  2. String values in the values array: The function stores xmlChar * strings (libxml-allocated) in the values array that feeds BuildTupleFromCStrings(). Since BuildTupleFromCStrings() copies/converts these strings into the result tuple (using palloc'd memory), the original libxml strings become garbage that was never freed.

Architectural Significance

Why This Matters for Long-Running Backends

PostgreSQL backends are long-lived processes. Memory leaked outside the palloc system accumulates indefinitely — it cannot be reclaimed by memory context resets, transaction boundaries, or even explicit RESET commands. The only remedy is backend termination.

The reproduction scripts demonstrate steady ~1.3-2 MB/iteration growth in backend RSS for xpath_table and similar growth for xpath_list. In production scenarios with connection pooling (where backends serve thousands of queries), this creates unbounded memory growth leading to OOM conditions.

Relationship to Recent Error-Handling Work

The author correctly identifies that commit 732061150b0 (from BUG #18943) addressed error-path cleanup — ensuring libxml resources are freed when exceptions occur. These patches address the complementary problem: resources that leak on the success path. Both are necessary for correct resource management.

Proposed Solution

Patch 0001: xpath_list Fix

The fix is straightforward: capture the return value of xmlXPathCastNodeToString() in a local variable, pass it to xmlBufferWriteCHAR(), then call xmlFree() on it. This is a minimal, low-risk change.

Patch 0002: xpath_table Fix

This patch is more involved:

  1. Free XPath result objects immediately after extraction: After each xmlXPathCompiledEval() call, once the needed value is extracted from the result, call xmlXPathFreeObject().

  2. Free per-column strings after tuple construction: After BuildTupleFromCStrings() has consumed the values array, iterate through and xmlFree() any libxml-allocated strings.

  3. Track allocations across PG_TRY blocks: The existing error-handling block is extended to also free any in-flight libxml allocations if an error occurs mid-evaluation. This prevents the leaks from simply moving to the error path.

Design Considerations

Back-patching Suitability

The author argues these are candidates for back-patching to all supported branches. This is well-justified because:

Why These Weren't Caught Earlier

contrib/xml2 is a legacy module (predating the core xml type's XPath support). It receives less attention than core code, and memory leaks from foreign allocators are invisible to PostgreSQL's memory accounting infrastructure and tools like MemoryContextStats().