Avoid unnecessary StringInfo allocation in tablesync COPY buffer

First seen: 2026-05-09 06:10:50+00:00 · Messages: 2 · Participants: 2

Latest Update

2026-05-11 · opus 4.7

Overview

This short thread concerns a minor but architecturally interesting cleanup in src/backend/replication/logical/tablesync.c, specifically in the copy_table() function used by logical replication's initial table synchronization (tablesync worker) path. The discussion touches on a recurring theme in PostgreSQL source hygiene: the misuse of the StringInfo abstraction as a convenience container when only a subset of its machinery is actually required.

The Core Problem

During the initial data synchronization phase of a logical replication subscription, the tablesync worker issues a COPY command against the publisher and streams the result into the local table via the COPY protocol's server-side reader. The data plumbing is handled by copy_read_data(), which is registered as the callback data source for CopyFrom().

In the current code, copy_table() does:

copybuf = makeStringInfo();

makeStringInfo() performs two palloc operations:

  1. One for the StringInfoData struct itself.
  2. One inside initStringInfo() for the initial 1024-byte growable buffer (data field).

However, copybuf is never used as a growable string buffer. It is exclusively passed to copy_read_data(), which treats it as a pointer-holder — overwriting buf->data with a pointer to the libpq-received row, and tracking buf->len and buf->cursor for partial-read state across calls. The data buffer allocated by initStringInfo() is immediately leaked/overwritten; the maxlen field is meaningless in this usage; no appendStringInfo*() calls ever occur on it.

In other words, StringInfo is being used here solely as an ad-hoc struct of {char *data; int len; int cursor;}. The heap allocation of both the struct and its 1024-byte backing buffer is pure waste, and — arguably more importantly — the code misleads readers about its intent.

Why This Matters Architecturally

PostgreSQL has a well-established pattern distinction:

David Rowley's commit a63bbc811d (referenced by the OP) is part of a broader cleanup campaign converting unnecessary makeStringInfo() uses into stack-allocated StringInfoData. The OP's patch follows that same direction mechanically.

Alvaro Herrera's objection, however, raises the deeper question: this isn't a case where StringInfoData is the right abstraction at all — it's a case where no StringInfo abstraction should be used. The three fields (data, len, cursor) are being used as raw state variables, not as a string builder. Dressing them up in StringInfo clothing is semantically misleading.

Proposed Solutions

Solution A (OP's patch): Stack-allocate StringInfoData

Replace:

StringInfo copybuf;
...
copybuf = makeStringInfo();

with:

StringInfoData copybuf;
...
/* zero-initialize; no growable buffer needed */

Pros: Minimal diff, consistent with Rowley's prior cleanups, avoids two pallocs. Cons: Preserves the semantic lie that this is a StringInfo.

Solution B (Alvaro's suggestion): Use three plain variables

Replace the StringInfo usage entirely with three local variables (e.g., char *data; int len; int cursor;), passed either individually or via a small purpose-built struct to copy_read_data().

Pros: Honest about what the code is doing; removes the cognitive overhead of wondering why StringInfo helpers are never called on this buffer. Cons: Larger patch touching copy_read_data()'s signature; copy_read_data() is a callback whose signature is constrained by CopyFromState's data_source_cb contract — which takes (void *outbuf, int minread, int maxread) — so the "StringInfo" here is actually an internal convention for how tablesync packages its state into the callback closure (cstate->copy_data). This somewhat justifies a struct, but not necessarily a StringInfo.

Key Technical Insights

  1. Callback state smuggling via StringInfo: The tablesync code leverages StringInfo as a convenient pre-existing struct to pass cursor/length/buffer state between copy_table() and the per-row copy_read_data() callback. This is an idiom that appears elsewhere in the COPY code (e.g., the backend COPY FROM protocol handler also reuses a StringInfo for libpq message accumulation), so there's some precedent, but the tablesync usage is degenerate because it never grows the buffer.

  2. libpq result lifetime: copy_read_data() calls PQgetCopyData() with async=1, which returns a malloc'd buffer owned by libpq. The code stores this pointer in copybuf->data and PQfreemem()s it on the next call. This is why the initial 1024-byte buffer from initStringInfo() is immediately orphaned — the data pointer gets overwritten with libpq-owned memory. This is a latent memory leak in the current code (the initial palloc'd buffer is never freed until the memory context is reset), though it's small enough to be harmless.

  3. Consistency with commit a63bbc81: Rowley's cleanup commits establish a clear project-wide preference. A patch following that pattern is low-risk and likely to be accepted, but Alvaro — a senior committer with strong opinions on code clarity — is pushing for the more correct refactor rather than the path of least resistance.

Participant Dynamics

Likely Resolution Path

Given Alvaro's pushback, the productive next step is a v2 patch that either:

Either approach would address both the performance micro-issue and the readability issue that Alvaro highlighted.