[PATCH] pg_bsd_indent: improve formatting of multiline comments

First seen: 2025-06-19 14:51:36+00:00 · Messages: 31 · Participants: 9

Latest Update

2026-05-06 · opus 4.7

Analysis: pg_bsd_indent / pgindent multiline comment formatting

The Core Problem

pg_bsd_indent is PostgreSQL's forked C code formatter (derived from NetBSD's indent, itself descended from code dating to 1976). It is invoked from the Perl wrapper pgindent to enforce PostgreSQL's canonical source formatting. A long-standing minor wart: when a multiline block comment opens with content on the same line as /*, like:

/* name_text()
 * Converts a Name type to a text type.
 */

the project convention is actually:

/*
 * name_text()
 * Converts a Name type to a text type.
 */

Nobody had fixed this because pg_bsd_indent is notoriously difficult to modify. Tom Lane describes it as "mostly unreadable spaghetti"; Bruce Momjian observes the irony that a code-readability tool is itself unreadable, with cryptic short variable names. The project has already forked from the NetBSD upstream, so there is no incoming-merge pressure preventing divergence — but also no one wants to touch the C.

Solution Path: Perl-side, not C-side

Aleksander Alekseev initially attempted a C fix inside pg_bsd_indent, then pivoted (with Bruce's explicit encouragement — "I have always found it easier to modify pgindent") to a post-processing Perl regex pass in src/tools/pgindent/pgindent. The essential transformation is a regex that detects a /* line carrying non-trivial payload and splits it into /*\n * payload.

The iteration history of the patch reveals how many edge cases this naive rewrite has to avoid — this is really the heart of the technical discussion:

  1. Comments without a leading * on every line (e.g., old cube.c style where subsequent lines are indented with plain spaces). Splitting the first line produces a broken mixed-style comment.
  2. Single-line-ish comments where */ sits at the end of the last content line (e.g., crypt-blowfish.c). The opener shouldn't be split because the result is visually lopsided.
  3. Banner/border comments opening with /* ---- or /* ==== or /* ****. Splitting these separates the border from /*, defeating the banner. Nathan Bossart argued strongly (+1) for an exception here because the churn is pure noise without a readability win.
  4. Mixed-format headers (e.g., contrib/seg/seg.c, src/tutorial/complex.c) where the comment opens with a plain /* but uses ****** border lines without leading *. These sit in a gray area; Aleksander suggests authors change the opener to /** to opt out, since /**** openers are already preserved.
  5. Comments with a trailing-whitespace /* opener — Tom Lane's late review found inet_aton.c producing * SUCH DAMAGE. \n */ (trailing space, detected by git diff --check). The cleanup regexes only consumed a single trailing space; they need [ \t]* to handle arbitrary whitespace/tabs.
  6. Latent bugs surfaced by reflow: the patch exposed a handful of real source typos — a stray & instead of * in partbounds.c, weird .* artifact from commit 25a30bb, and a broken continuation line from commit 393e0d2 in waiteventset.c. Nathan noted approvingly that "this patch has been rather good at finding small mistakes like this."
  7. Embedded preprocessor directives inside commentsnbtsort.c had #undef DISABLE_LEADER_PARTICIPATION sitting inside a comment block, which the reflow mangled into *#undef .... The agreed fix (Michael Paquier citing pg_config_manual.h's REALLOCATE_BITMAPSETS as precedent) was to hoist such directives out of the comment entirely; Nathan also flipped #undef to #define so the knob actually does something when uncommented.

Design Tradeoff: Exception List vs Purity

Aleksander's stated preference was minimalism — "I don't like unnecessary exceptions." But the review consensus (Nathan, Michael, Arseniy) pushed toward pragmatic exceptions for banner comments because the churn-to-value ratio is terrible: applying the patch without exceptions changed ~380 files, most of which were banner comments nobody wanted touched. The final v5+ logic therefore carries explicit skips for /* ---, /* ===, and likely /* *** prefixes, plus structural checks (last line containing */, missing leading * on intermediate lines).

Evan Chao also flagged a separate concern: the reflow can re-wrap paragraph text, causing diff churn like

- * Direct advancement: avoid waking non-caught up backends that
- * aren't interested in our notifications.
+ * Direct advancement: avoid waking non-caught up backends that aren't
+ * interested in our notifications.

This is actually pg_bsd_indent's own comment-rewrapping behavior being triggered more frequently because the first-line split now exposes more comments to its paragraph logic. It was not addressed in the patch; it's an inherent cost of normalizing the opener.

The Broader Architectural Conversation

The thread pivots mid-stream into a meta-discussion about the future of pg_bsd_indent itself, initiated by Álvaro Herrera: since PostgreSQL already owns a fork, and since the entire PostgreSQL source tree functions as a massive regression corpus, why not absorb and refactor it?

No decision was reached; this is parked as future work.

Commit and Aftermath

Nathan committed the patch on 2026-02-06. Tom's May 2026 post-commit review found the trailing-whitespace-tolerance bug and the &-vs-* typo surfaced in partbounds.c, plus a pgperltidy complaint about the patch's own formatting (ironic given the subject matter). Aleksander produced v7 fixing the regex classes to [ \t]* and running pgperltidy on the script.

Why This Matters Architecturally

On the surface this is cosmetic. But two deeper points emerge:

  1. The tooling is load-bearing. pgindent runs on every commit's worth of code and defines what "correct PostgreSQL C style" means. A bug here propagates into every future diff. That's why even small changes require ~380-file dry runs and multiple reviewer passes.
  2. The cost of unmaintainable forked tooling compounds. The inability to fix pg_bsd_indent in C for a trivial cosmetic issue forced the fix into a second language (Perl) post-processing layer, which itself has edge cases, which will themselves need future maintenance. Every such workaround makes the next fix harder. The Álvaro/Tom/Nathan/Michael discussion is really about whether to pay down this debt.