Analysis: Proposed Overflow Guard in contrib/xml2 parse_params
Core Problem (as framed by submitter)
The submitter, Varik Matevosyan, proposed a defensive patch against signed-integer overflow in contrib/xml2's parse_params routine. The specific code pattern under scrutiny is the conventional PostgreSQL idiom of growing a dynamically-sized array by doubling a capacity counter (max_params *= 2 or similar), where the loop terminates because AllocSizeIsValid (or the palloc call itself) rejects the oversized request before the signed integer wraps to a negative value.
The submitter argued two things:
- Relying on signed-integer overflow behavior is technically undefined behavior (UB) in C, even if in practice
AllocSizeIsValidcatches the oversized allocation first. - The current safety is "incidental" — it works only because the
MaxAllocSizeceiling happens to be crossed before overflow occurs.
The submitter acknowledged up front that the overflow is unreachable in current builds (since text input is bounded by MaxAllocSize, bounding nparams below the doubling threshold), framing this as a stylistic/robustness cleanup to align with "explicit overflow-checking idiom used elsewhere in the tree."
Tom Lane's Rejection and the Architectural Reasoning
Tom Lane (core committer, final authority on code-style matters across the tree) rejected the patch with a concise but architecturally important explanation:
-
The UB argument is incorrect in practice.
AllocSizeIsValidrejects any allocation request>= 1GB(i.e.,MaxAllocSize). Because array growth doubles the capacity, the allocation-size check will fail one iteration before the signed integer counter itself could overflowINT_MAX(~2GB worth of element count for typical element sizes, and much earlier for larger structs). Thus the overflow path is genuinely unreachable — not merely unlikely. -
This is the intentional design of
MaxAllocSize. Tom pointed to the comment inutils/memutils.hexplaining thatMaxAllocSizeis deliberately set to0x3fffffff(1GB − 1) rather than 2GB precisely to provide this headroom: it guarantees that size calculations involving doubling or modest multiplication cannot overflow a signed 32-bit integer before the allocator itself rejects the request. In other words, the "incidental" safety the submitter described is actually a deliberate invariant baked into the memory context API contract. -
Consistency argument inverted. The submitter claimed the patch would make
parse_paramsmore consistent with the rest of the tree. Tom countered that the opposite is true: "many many places in our tree" use exactly this idiom, so adding explicit pre-overflow checks here would makeparse_paramsan outlier, not a conformer.
Key Architectural Insight: MaxAllocSize as a Safety Contract
The deeper lesson here is that MaxAllocSize = 0x3fffffff is not an arbitrary limit — it is a load-bearing invariant that callers throughout the codebase implicitly rely on. The 1GB cap (versus a hypothetical 2GB cap which would still fit in a signed int32) provides a guaranteed ~1GB margin before INT_MAX, which is enough to absorb:
- Doubling of a size counter (worst case: request is just under 1GB, doubled would be just under 2GB, still representable).
- Multiplying by small struct sizes in
size * sizeof(T)calculations. - Adding small header overheads.
This means idiomatic PostgreSQL growth loops of the form:
if (nparams >= max_params) {
max_params *= 2;
params = repalloc(params, max_params * sizeof(...));
}
are safe by construction without explicit overflow checks, as long as the element size is reasonable and the per-allocation cap is respected by repalloc/palloc. Adding belt-and-suspenders checks clutters the code and obscures this well-understood invariant.
Tradeoffs and Implications
- Static analyzer noise vs. code clarity. Modern static analyzers and UBSan may flag doubling patterns as potential overflow. The project has historically chosen not to accommodate these warnings when the invariant is guaranteed by
MaxAllocSize. This is a deliberate policy choice privileging code-base uniformity over tool-friendliness. - Portability assumptions. The argument assumes
intis at least 32 bits (true on all supported platforms) and thatMaxAllocSizeremains at 1GB. If a futureHUGE_ALLOCvariant were adopted for palloc with a 2GB+ cap, callers using such variants would need explicit overflow guards — but that is a separate refactor, not an argument for preemptive hardening of the existing idiom. - Rejection signals broader policy. Tom's reply doubles as guidance to future submitters: do not send mechanical "harden against UB" patches for the doubling idiom unless you can demonstrate a reachable overflow path.
Outcome
The patch was rejected on first review with no support from other hackers. The thread is effectively closed after a single committer response. No code change will be made.