Analysis: pg_bsd_indent / pgindent — Formatting of Multiline Comments
Core Problem
PostgreSQL's source tree has historically tolerated two common styles for multiline block comments:
/* This is line 1 vs. /*
* This is line 2 * This is line 1
*/ * This is line 2
*/
The project's written style guide prefers the second form (a "bare" /* on its own line introducing the block), but pgindent did not enforce it. As a result, the tree contains a long tail of inconsistently formatted block comments that accumulate over years of contributions. The patch under review extends the Perl-based pgindent wrapper (not pg_bsd_indent itself, despite the subject line) with a post-processing pass that rewrites non-conforming multiline comments into the canonical form.
This matters architecturally for a few reasons:
- Tree-wide
pgindentruns. PostgreSQL periodically runspgindentacross the whole tree (typically before branching) to normalize formatting. Any inconsistency thatpgindentdoes not fix becomes permanent noise ingit blameand review diffs. Extending the tool means future runs converge the tree rather than leaving stylistic drift. - Idempotency.
pgindentmust be idempotent — running it twice must yield the same result as running it once. Any transform added has to be carefully bounded so it doesn't oscillate or repeatedly churn files. - Comment semantics. Not every
/* ... */block is prose. The tree contains Doxygen-style comments (/**), license/compat markers (/*-), and ASCII "separator" banners (/* ====,/* ----). These must be preserved verbatim, since mechanical reformatting would destroy either tool-consumed metadata or human-intended visual structure.
The Proposed Solution
The patch adds postprocess_multiline_comment() to the pgindent script. Its logic:
- Operate only on block comments (skipping line comments and single-line
/* ... */). - Skip blocks whose opening token is one of the reserved forms:
/**(Doxygen),/*-(license/compat),/* ===//* ---(separators). - For remaining blocks, if the first line is
/* <text>(i.e. text on the opener), split it so/*stands alone and the text begins on the next line, properly prefixed with*.
v7 gatekeeping regex
v7 was conservative. Before touching a comment, it required:
if ( ($lines[0] ne "/*" && $lines[-1] ne " */")
or ($lines[1] !~ m!^\s+\*!))
{
return $source;
}
i.e. "only reformat comments that already look mostly canonical." This guard was there for a specific reason, which Tom Lane identified empirically: without it, the transform happily converts /* foo bar */ (a legitimate single-line comment) into a three-line block. That's wrong — the patch is supposed to normalize multi-line comments, not inflate single-line ones.
Tom Lane's v8 rework
Tom replaced the over-broad guard with a narrower, semantically correct one: an explicit test that rejects single-line comments (where */ appears on the same line as /*). This unlocks reformatting of a much larger population of comments that v7 had been refusing to touch because their current shape didn't already partially conform.
Additionally, v8 adds normalization passes:
- Leading whitespace normalization across the body lines of the block, so continuation lines line up with a consistent
*prefix. - Normalizing the number of
*characters, presumably collapsing runs like**or***introduced by ad-hoc authors down to the canonical single*.
Tom shipped two diff artifacts alongside v8 to make the blast radius legible: v7-0001.diff.nocfbot (what v7 changes against HEAD) and v8-0001.diff.nocfbot (the additional changes v8 makes on top). This is a standard technique for tree-wide mechanical patches — reviewers need to eyeball the induced diff, not just the tool change.
Key Design Decisions & Tradeoffs
1. Enforcement vs. tolerance of existing styles
Solai Murugan raised a concern about contrib/seg/seg.c: its banner-style header comment gets rewritten (lines gain a * prefix), producing what he called "unnecessary diff churn" akin to the separator-comment discussion.
Tom flatly rejected the premise: "the point of this patch is to standardize our multiline comments, not to allow multiple styles to persist." This is the key philosophical decision of the thread. A tool like pgindent only earns its keep if it's opinionated; carving out exceptions for every author's preferred banner style defeats the purpose. The only exceptions that survive are those with semantic meaning (/** for Doxygen, /*- for compat) or that are unambiguous visual separators (/* ====), not merely "looks intentional."
2. Single-line comments as a hard boundary
The bug Tom caught — inflating /* foo bar */ into a three-line comment — illustrates a category error. The transform's purpose is to hoist prose off the /* line in multi-line comments. A single-line comment has no body to hoist to; reformatting it would be a style change rather than a style normalization. Making that a direct, named test (rather than a side effect of a conservative guard) is cleaner and allows the rest of the guard to be loosened.
3. Residual manual cleanup
Tom acknowledges one casualty: Martin Utesch's ASCII-art signatures in src/backend/optimizer/geqo/* use a mix of *-prefixed and =-prefixed lines, which the normalization mis-handles. Rather than add yet another heuristic, Tom proposes handling those by hand before the tree-wide run — i.e. preprocessing the handful of offending files so their leading characters are uniform. This is a pragmatic tradeoff: a few minutes of manual work beats permanently complicating the tool for a handful of historical artifacts.
Technical Insights
- Idempotency by construction. Solai's test ("repeated
pgindentruns did not produce additional changes, andgit diff --checkwas clean") is the real acceptance criterion for this kind of patch. If a second run produced further changes, the transform would be broken regardless of how pretty the first run's output looked. - Exclusion list is a public contract. The set
{/**, /*-, /* ===, /* ---}effectively becomes documented syntax within PostgreSQL source. Oncepgindentenforces the rule, authors learn that if they want a separator banner preserved, they must start it with one of these sigils. This is a subtle but real API-like surface for the codebase. - Performance is irrelevant here. Payal measured 0.038s on a 74KB file.
pgindentruns rarely (pre-release, not per-commit), and the regex only fires inside comment blocks. This is a correctness patch, not a performance patch. - Committer intervention pattern. The progression v6 → v7 → v8 shows a typical PostgreSQL cycle: an external contributor (Aleksander, the apparent author referenced by Solai) iterates on a patch that reviewers find acceptable in a conservative form; a senior committer (Tom Lane) then takes over and pushes the scope wider, because the whole point of a tree-wide tool change is to be comprehensive rather than timid.
Participant Weight
- Tom Lane is the decisive voice. As the most senior committer and the de-facto owner of tree-wide mechanical cleanups (he has driven numerous historical
pgindentruns), his judgment that "v7 isn't going far enough" effectively sets the final scope. His contribution is not just review but concrete re-engineering (v8) with impact diffs attached. - Payal S. provides a competent but shallow feature-level review — applies the patch, checks behavior on canonical inputs, flags a minor doc nit. Useful for commitfest throughput but does not shape design.
- Solai Murugan surfaces the one substantive design question (should banner-style comments be exempt?) and gets it overruled on principle. His idempotency check is the most valuable part of his review.
Implications for the Codebase
Once committed, the next tree-wide pgindent run will produce a large but mechanical diff touching multiline comments throughout the backend. After that, the style becomes self-enforcing: any new comment written in the non-canonical form will be silently corrected on the next run, so reviewers no longer need to catch it manually. The Doxygen/compat/separator escape hatches remain available for authors who genuinely need them.