Overview
This short thread concerns a minor but architecturally interesting cleanup in src/backend/replication/logical/tablesync.c, specifically in the copy_table() function used by logical replication's initial table synchronization (tablesync worker) path. The discussion touches on a recurring theme in PostgreSQL source hygiene: the misuse of the StringInfo abstraction as a convenience container when only a subset of its machinery is actually required.
The Core Problem
During the initial data synchronization phase of a logical replication subscription, the tablesync worker issues a COPY command against the publisher and streams the result into the local table via the COPY protocol's server-side reader. The data plumbing is handled by copy_read_data(), which is registered as the callback data source for CopyFrom().
In the current code, copy_table() does:
copybuf = makeStringInfo();
makeStringInfo() performs two palloc operations:
- One for the
StringInfoDatastruct itself. - One inside
initStringInfo()for the initial 1024-byte growable buffer (datafield).
However, copybuf is never used as a growable string buffer. It is exclusively passed to copy_read_data(), which treats it as a pointer-holder — overwriting buf->data with a pointer to the libpq-received row, and tracking buf->len and buf->cursor for partial-read state across calls. The data buffer allocated by initStringInfo() is immediately leaked/overwritten; the maxlen field is meaningless in this usage; no appendStringInfo*() calls ever occur on it.
In other words, StringInfo is being used here solely as an ad-hoc struct of {char *data; int len; int cursor;}. The heap allocation of both the struct and its 1024-byte backing buffer is pure waste, and — arguably more importantly — the code misleads readers about its intent.
Why This Matters Architecturally
PostgreSQL has a well-established pattern distinction:
StringInfo(pointer) +makeStringInfo(): used when the buffer's lifetime must outlive the current stack frame, or when it will be returned/stored.StringInfoData(stack-allocated) +initStringInfo(): preferred when the buffer is locally scoped, avoiding an extra palloc and pointer indirection.
David Rowley's commit a63bbc811d (referenced by the OP) is part of a broader cleanup campaign converting unnecessary makeStringInfo() uses into stack-allocated StringInfoData. The OP's patch follows that same direction mechanically.
Alvaro Herrera's objection, however, raises the deeper question: this isn't a case where StringInfoData is the right abstraction at all — it's a case where no StringInfo abstraction should be used. The three fields (data, len, cursor) are being used as raw state variables, not as a string builder. Dressing them up in StringInfo clothing is semantically misleading.
Proposed Solutions
Solution A (OP's patch): Stack-allocate StringInfoData
Replace:
StringInfo copybuf;
...
copybuf = makeStringInfo();
with:
StringInfoData copybuf;
...
/* zero-initialize; no growable buffer needed */
Pros: Minimal diff, consistent with Rowley's prior cleanups, avoids two pallocs. Cons: Preserves the semantic lie that this is a StringInfo.
Solution B (Alvaro's suggestion): Use three plain variables
Replace the StringInfo usage entirely with three local variables (e.g., char *data; int len; int cursor;), passed either individually or via a small purpose-built struct to copy_read_data().
Pros: Honest about what the code is doing; removes the cognitive overhead of wondering why StringInfo helpers are never called on this buffer.
Cons: Larger patch touching copy_read_data()'s signature; copy_read_data() is a callback whose signature is constrained by CopyFromState's data_source_cb contract — which takes (void *outbuf, int minread, int maxread) — so the "StringInfo" here is actually an internal convention for how tablesync packages its state into the callback closure (cstate->copy_data). This somewhat justifies a struct, but not necessarily a StringInfo.
Key Technical Insights
-
Callback state smuggling via StringInfo: The tablesync code leverages
StringInfoas a convenient pre-existing struct to pass cursor/length/buffer state betweencopy_table()and the per-rowcopy_read_data()callback. This is an idiom that appears elsewhere in the COPY code (e.g., the backend COPY FROM protocol handler also reuses a StringInfo for libpq message accumulation), so there's some precedent, but the tablesync usage is degenerate because it never grows the buffer. -
libpq result lifetime:
copy_read_data()callsPQgetCopyData()withasync=1, which returns a malloc'd buffer owned by libpq. The code stores this pointer incopybuf->dataandPQfreemem()s it on the next call. This is why the initial 1024-byte buffer frominitStringInfo()is immediately orphaned — thedatapointer gets overwritten with libpq-owned memory. This is a latent memory leak in the current code (the initialpalloc'd buffer is never freed until the memory context is reset), though it's small enough to be harmless. -
Consistency with commit a63bbc81: Rowley's cleanup commits establish a clear project-wide preference. A patch following that pattern is low-risk and likely to be accepted, but Alvaro — a senior committer with strong opinions on code clarity — is pushing for the more correct refactor rather than the path of least resistance.
Participant Dynamics
-
Evan Li (OP): Submitted the patch as a spin-off from reviewing another patch. Framing is conservative: "make this consistent with existing cleanups." This is a reasonable newcomer-friendly framing.
-
Álvaro Herrera: Long-time core committer. His response is characteristically direct — he's identifying that the OP's patch treats a symptom (extra palloc) rather than the disease (abstraction misuse). His stance carries significant weight; a patch author would be well-advised to produce a v2 that uses plain variables or a small purpose-built struct rather than merely swapping
StringInfoforStringInfoData.
Likely Resolution Path
Given Alvaro's pushback, the productive next step is a v2 patch that either:
- Introduces a small local struct like
typedef struct CopyReadState { char *data; int len; int cursor; } CopyReadState;used as the callback's private state, or - Uses three loose local variables captured via a purpose-built tiny struct at the
copy_table()level.
Either approach would address both the performance micro-issue and the readability issue that Alvaro highlighted.