Analysis: pg_bsd_indent / pgindent multiline comment formatting
The Core Problem
pg_bsd_indent is PostgreSQL's forked C code formatter (derived from NetBSD's indent, itself descended from code dating to 1976). It is invoked from the Perl wrapper pgindent to enforce PostgreSQL's canonical source formatting. A long-standing minor wart: when a multiline block comment opens with content on the same line as /*, like:
/* name_text()
* Converts a Name type to a text type.
*/
the project convention is actually:
/*
* name_text()
* Converts a Name type to a text type.
*/
Nobody had fixed this because pg_bsd_indent is notoriously difficult to modify. Tom Lane describes it as "mostly unreadable spaghetti"; Bruce Momjian observes the irony that a code-readability tool is itself unreadable, with cryptic short variable names. The project has already forked from the NetBSD upstream, so there is no incoming-merge pressure preventing divergence — but also no one wants to touch the C.
Solution Path: Perl-side, not C-side
Aleksander Alekseev initially attempted a C fix inside pg_bsd_indent, then pivoted (with Bruce's explicit encouragement — "I have always found it easier to modify pgindent") to a post-processing Perl regex pass in src/tools/pgindent/pgindent. The essential transformation is a regex that detects a /* line carrying non-trivial payload and splits it into /*\n * payload.
The iteration history of the patch reveals how many edge cases this naive rewrite has to avoid — this is really the heart of the technical discussion:
- Comments without a leading
*on every line (e.g., oldcube.cstyle where subsequent lines are indented with plain spaces). Splitting the first line produces a broken mixed-style comment. - Single-line-ish comments where
*/sits at the end of the last content line (e.g.,crypt-blowfish.c). The opener shouldn't be split because the result is visually lopsided. - Banner/border comments opening with
/* ----or/* ====or/* ****. Splitting these separates the border from/*, defeating the banner. Nathan Bossart argued strongly (+1) for an exception here because the churn is pure noise without a readability win. - Mixed-format headers (e.g.,
contrib/seg/seg.c,src/tutorial/complex.c) where the comment opens with a plain/*but uses******border lines without leading*. These sit in a gray area; Aleksander suggests authors change the opener to/**to opt out, since/****openers are already preserved. - Comments with a trailing-whitespace
/*opener — Tom Lane's late review foundinet_aton.cproducing* SUCH DAMAGE. \n */(trailing space, detected bygit diff --check). The cleanup regexes only consumed a single trailing space; they need[ \t]*to handle arbitrary whitespace/tabs. - Latent bugs surfaced by reflow: the patch exposed a handful of real source typos — a stray
&instead of*inpartbounds.c, weird.*artifact from commit 25a30bb, and a broken continuation line from commit 393e0d2 inwaiteventset.c. Nathan noted approvingly that "this patch has been rather good at finding small mistakes like this." - Embedded preprocessor directives inside comments —
nbtsort.chad#undef DISABLE_LEADER_PARTICIPATIONsitting inside a comment block, which the reflow mangled into*#undef .... The agreed fix (Michael Paquier citingpg_config_manual.h'sREALLOCATE_BITMAPSETSas precedent) was to hoist such directives out of the comment entirely; Nathan also flipped#undefto#defineso the knob actually does something when uncommented.
Design Tradeoff: Exception List vs Purity
Aleksander's stated preference was minimalism — "I don't like unnecessary exceptions." But the review consensus (Nathan, Michael, Arseniy) pushed toward pragmatic exceptions for banner comments because the churn-to-value ratio is terrible: applying the patch without exceptions changed ~380 files, most of which were banner comments nobody wanted touched. The final v5+ logic therefore carries explicit skips for /* ---, /* ===, and likely /* *** prefixes, plus structural checks (last line containing */, missing leading * on intermediate lines).
Evan Chao also flagged a separate concern: the reflow can re-wrap paragraph text, causing diff churn like
- * Direct advancement: avoid waking non-caught up backends that
- * aren't interested in our notifications.
+ * Direct advancement: avoid waking non-caught up backends that aren't
+ * interested in our notifications.
This is actually pg_bsd_indent's own comment-rewrapping behavior being triggered more frequently because the first-line split now exposes more comments to its paragraph logic. It was not addressed in the patch; it's an inherent cost of normalizing the opener.
The Broader Architectural Conversation
The thread pivots mid-stream into a meta-discussion about the future of pg_bsd_indent itself, initiated by Álvaro Herrera: since PostgreSQL already owns a fork, and since the entire PostgreSQL source tree functions as a massive regression corpus, why not absorb and refactor it?
- Tom Lane (committer, deep history here): confirms the code is spaghetti but agrees divergence is fine; skeptical a cleanup is tractable.
- Nathan Bossart (committer, eventual committer of this patch): is generally anti-rewrite but thinks this specific codebase may warrant one, because he couldn't debug it by any means other than trial and error.
- Tom further floats an ambitious goal: eliminate
typedefs.list. C's grammar is famously ambiguous without knowing which identifiers are typedefs (the "lexer hack"), but other indenters manage heuristically; for formatting purposes ambiguity may be acceptable. He stops short of recommending the work. - Michael Paquier proposes the opposite direction: absorb
pgindent's Perl logic into the C binary, removing one layer rather than rewriting the other. - Aleksander suggests a Python rewrite, contingent on the PyTest migration patchset landing first.
- Bruce notes the Perl wrapper exists precisely because of
pg_bsd_indent's limitations — it's compensation, not design.
No decision was reached; this is parked as future work.
Commit and Aftermath
Nathan committed the patch on 2026-02-06. Tom's May 2026 post-commit review found the trailing-whitespace-tolerance bug and the &-vs-* typo surfaced in partbounds.c, plus a pgperltidy complaint about the patch's own formatting (ironic given the subject matter). Aleksander produced v7 fixing the regex classes to [ \t]* and running pgperltidy on the script.
Why This Matters Architecturally
On the surface this is cosmetic. But two deeper points emerge:
- The tooling is load-bearing.
pgindentruns on every commit's worth of code and defines what "correct PostgreSQL C style" means. A bug here propagates into every future diff. That's why even small changes require ~380-file dry runs and multiple reviewer passes. - The cost of unmaintainable forked tooling compounds. The inability to fix
pg_bsd_indentin C for a trivial cosmetic issue forced the fix into a second language (Perl) post-processing layer, which itself has edge cases, which will themselves need future maintenance. Every such workaround makes the next fix harder. The Álvaro/Tom/Nathan/Michael discussion is really about whether to pay down this debt.