Monthly Summary: Make printtup a bit faster (May 2026)
Overview
After a two-year hiatus (2024–2026), this thread reached design consensus in May 2026. The long-standing debate over how to eliminate the palloc + strlen + memcpy overhead in PostgreSQL's tuple output path converged on Andres Freund's fcinfo-carried-context approach, with a rough prototype posted and all active participants aligned.
The Problem
printtup — the DestReceiver callback formatting tuples for the wire protocol — dominates CPU profiles (~85% self+children on trivial queries). The root cause is the typoutput function API: every output function returns a freshly palloc'd cstring, forcing the caller to strlen() the result (recovering a length the producer already knew), memcpy it into the send buffer, and eventually free the allocation. For high-throughput queries this produces three redundant copies and a pointless strlen per datum.
Design Convergence
Three competing designs were evaluated:
-
Andy Fan's
{type}printfunctions — optional per-type functions taking(Datum, StringInfo), hardcoded for common types. Andy withdrew this in favor of the consensus approach. -
David Rowley's full signature rewrite — change
typoutputitself to(Datum, StringInfo). Not defended after Tom Lane's objections about permanent backward-compatibility dispatch costs. -
Andres Freund's fcinfo-context approach (selected) — pass an optional "output context" (containing the caller's
StringInfo) throughFunctionCallInfo. Opt-in output functions append directly to the caller's buffer; legacy functions continue returningcstringunchanged. UsesPG_WINDOW_OBJECT()/WindowObjectIsValid()as the in-tree precedent for smuggling typed side-channels through fcinfo.
Tom Lane endorsed the rollout strategy modeled on commit d9f7f5d32 (soft error reporting for input functions): infrastructure first, convert a handful of high-value callees as demonstration, no flag day.
Prototype Posted
Andres posted a two-patch sketch (2026-05-06):
- Patch 1 (independent): Refactors
pg_server_to_client()to write directly into a caller-suppliedStringInfoinstead of pessimistically allocating and copying. Expected to yield measurable improvement on its own and is a prerequisite for the full optimization. - Patch 2 (depends on 1): Demonstrates the fcinfo-context mechanism — the caller sets up a context struct (including destination
StringInfo) in fcinfo before calling the output function; opt-in functions detect and use it.
The prototype is self-described as "very rough" and not yet benchmarkable, but establishes the concrete shape of the solution.
Key Technical Insights
- Converting
array_out+record_outcovers ~82% of types by catalog count (510/621 pg_type rows), and since these recursively call per-element output functions, scalar conversions compound. - Only high-value types need conversion:
textout,byteaout,int[248]out, timestamp variants. Tom explicitly doubtsnumeric_outorpoint_outwould show measurable wins. initReadOnlyStringInfocan wrap legacy cstring returns without copying, minimizing dispatch overhead for unconverted functions.- Input-side length propagation (enabling SIMD
pg_strtoint32_safe, COPY delimiter scanning) is identified as a parallel project with the same architectural motivation.
Prior Benchmark
Andy's earlier PoC showed ~18% improvement (0.134ms → 0.110ms) on SELECT * FROM demo with oid/text columns, using the now-superseded {type}print approach. The fcinfo-context approach should yield similar or better gains with cleaner architecture.
Status
The design debate is closed. Next steps:
- Andres polishes Patch 1 (
pg_server_to_clientStringInfo refactor) for independent commit - Patch 2 matures with proper helper macros and benchmark validation
- High-value output functions converted incrementally per the soft-error-reporting playbook