Review - Patch for pg_bsd_indent: improve formatting of multiline comments

First seen: 2026-03-27 18:34:56+00:00 · Messages: 4 · Participants: 4

Latest Update

2026-05-11 · opus 4.7

Analysis: pg_bsd_indent / pgindent — Formatting of Multiline Comments

Core Problem

PostgreSQL's source tree has historically tolerated two common styles for multiline block comments:

/* This is line 1         vs.      /*
 * This is line 2                   * This is line 1
 */                                 * This is line 2
                                    */

The project's written style guide prefers the second form (a "bare" /* on its own line introducing the block), but pgindent did not enforce it. As a result, the tree contains a long tail of inconsistently formatted block comments that accumulate over years of contributions. The patch under review extends the Perl-based pgindent wrapper (not pg_bsd_indent itself, despite the subject line) with a post-processing pass that rewrites non-conforming multiline comments into the canonical form.

This matters architecturally for a few reasons:

  1. Tree-wide pgindent runs. PostgreSQL periodically runs pgindent across the whole tree (typically before branching) to normalize formatting. Any inconsistency that pgindent does not fix becomes permanent noise in git blame and review diffs. Extending the tool means future runs converge the tree rather than leaving stylistic drift.
  2. Idempotency. pgindent must be idempotent — running it twice must yield the same result as running it once. Any transform added has to be carefully bounded so it doesn't oscillate or repeatedly churn files.
  3. Comment semantics. Not every /* ... */ block is prose. The tree contains Doxygen-style comments (/**), license/compat markers (/*-), and ASCII "separator" banners (/* ====, /* ----). These must be preserved verbatim, since mechanical reformatting would destroy either tool-consumed metadata or human-intended visual structure.

The Proposed Solution

The patch adds postprocess_multiline_comment() to the pgindent script. Its logic:

v7 gatekeeping regex

v7 was conservative. Before touching a comment, it required:

if (   ($lines[0] ne "/*" && $lines[-1] ne " */")
    or ($lines[1] !~ m!^\s+\*!))
{
    return $source;
}

i.e. "only reformat comments that already look mostly canonical." This guard was there for a specific reason, which Tom Lane identified empirically: without it, the transform happily converts /* foo bar */ (a legitimate single-line comment) into a three-line block. That's wrong — the patch is supposed to normalize multi-line comments, not inflate single-line ones.

Tom Lane's v8 rework

Tom replaced the over-broad guard with a narrower, semantically correct one: an explicit test that rejects single-line comments (where */ appears on the same line as /*). This unlocks reformatting of a much larger population of comments that v7 had been refusing to touch because their current shape didn't already partially conform.

Additionally, v8 adds normalization passes:

Tom shipped two diff artifacts alongside v8 to make the blast radius legible: v7-0001.diff.nocfbot (what v7 changes against HEAD) and v8-0001.diff.nocfbot (the additional changes v8 makes on top). This is a standard technique for tree-wide mechanical patches — reviewers need to eyeball the induced diff, not just the tool change.

Key Design Decisions & Tradeoffs

1. Enforcement vs. tolerance of existing styles

Solai Murugan raised a concern about contrib/seg/seg.c: its banner-style header comment gets rewritten (lines gain a * prefix), producing what he called "unnecessary diff churn" akin to the separator-comment discussion.

Tom flatly rejected the premise: "the point of this patch is to standardize our multiline comments, not to allow multiple styles to persist." This is the key philosophical decision of the thread. A tool like pgindent only earns its keep if it's opinionated; carving out exceptions for every author's preferred banner style defeats the purpose. The only exceptions that survive are those with semantic meaning (/** for Doxygen, /*- for compat) or that are unambiguous visual separators (/* ====), not merely "looks intentional."

2. Single-line comments as a hard boundary

The bug Tom caught — inflating /* foo bar */ into a three-line comment — illustrates a category error. The transform's purpose is to hoist prose off the /* line in multi-line comments. A single-line comment has no body to hoist to; reformatting it would be a style change rather than a style normalization. Making that a direct, named test (rather than a side effect of a conservative guard) is cleaner and allows the rest of the guard to be loosened.

3. Residual manual cleanup

Tom acknowledges one casualty: Martin Utesch's ASCII-art signatures in src/backend/optimizer/geqo/* use a mix of *-prefixed and =-prefixed lines, which the normalization mis-handles. Rather than add yet another heuristic, Tom proposes handling those by hand before the tree-wide run — i.e. preprocessing the handful of offending files so their leading characters are uniform. This is a pragmatic tradeoff: a few minutes of manual work beats permanently complicating the tool for a handful of historical artifacts.

Technical Insights

Participant Weight

Implications for the Codebase

Once committed, the next tree-wide pgindent run will produce a large but mechanical diff touching multiline comments throughout the backend. After that, the style becomes self-enforcing: any new comment written in the non-canonical form will be silently corrected on the next run, so reviewers no longer need to catch it manually. The Doxygen/compat/separator escape hatches remain available for authors who genuinely need them.