pgindent versus struct members and typedefs

First seen: 2025-12-02 22:00:51+00:00 · Messages: 17 · Participants: 5

Latest Update

2026-05-07 · opus 4.7

What's New: Release Coordination, Not Technical Substance

The new messages are entirely about commit choreography, not new technical ground. The patch itself is unchanged in substance — only the commit message was polished.

Key developments

  1. Upstream divergence question resolved (negatively). Nathan checked pstef/freebsd_indent against PostgreSQL's copy and reports the upstream fix analogous to this one actually predates the fork — it just never covered this particular case. So PostgreSQL is not missing a pile of post-fork fixes on this axis; Tom's open question from the prior round ("should we sync wholesale?") is effectively answered "there's nothing waiting upstream for this bug." Other FreeBSD commits exist but Nathan doesn't flag any as relevant.

  2. INDENT_VERSION bump coordination. Nathan asked whether this change warrants an INDENT_VERSION bump (prior pgindent changes have done so). Tom revealed there is already a queued bump for an unrelated change — the "space between comma and period" fix — and declined a second bump in the same cycle, asking that the two be pushed together.

  3. Near-miss commit averted. Nathan initially said he'd commit 0001 immediately without the version bump. Tom sharply countered: committing the indenter change without a version bump would silently break anyone using the in-tree pg_bsd_indent, and committing the resulting code reformatting without the version bump would break out-of-tree copies. The whole set — indenter fix, version bump, reformatting — must land atomically. Nathan retracted and will hold the patches.

  4. New v2 of 0001. Posted with only a cleaned-up commit message; 0002 (the tree-wide reformatting diff) is dropped from the posting because it is mechanically regeneratable.

Why this matters

The episode is a small but clean illustration of the pg_bsd_indent release protocol: the indenter, its version tag, and the tree's formatted output form a tightly coupled triple, and any mismatch across the three breaks either in-tree or out-of-tree consumers. Tom's veto enforces that invariant. Nothing technical about last_u_d or the parsing bug has moved.

History (1 prior analysis)
2026-05-06 · opus 4.7

pgindent versus struct members and typedefs

Core Problem

pgindent — PostgreSQL's tree-wide C formatter, forked long ago from BSD indent — has a long-standing class of bugs related to how it parses identifiers that collide with known typedef names. The immediate symptom Nathan Bossart reported is that the != operator fails to receive its customary trailing space when the right-hand operand is a typedef name (or otherwise confuses the parser's "last token was a unary/declarator" state). For example:

entry->dsh.dsa_handle !=DSA_HANDLE_INVALID
ctx->options.output_type !=OUTPUT_PLUGIN_TEXTUAL_OUTPUT
state.manifest_file !=NULL

A simple grep -E "!=[A-Za-z]" -rI over the tree located every instance, and Nathan observed two curious triggers:

  1. Renaming the struct member so it no longer shadows a typedef fixes the spacing.
  2. Changing != to == also fixes it.

That strongly implicated pgindent's internal state machine around the last_u_d flag (the "last token could be unary-or-declarator" bit), which is exactly the class of bug that upstream FreeBSD indent had already patched in commit afa2239 of pstef's fork. The same state-tracking issue is documented in pgindent's README via commit c4133ec, which warns that function names colliding with typedef names get mangled — a symptom of the same root cause: pgindent's tokenizer/parser blurs the distinction between "identifier used as a type" and "identifier used as an ordinary name" based on a global typedef table, and then feeds that into spacing heuristics that were designed for K&R-era declarators.

Why This Matters Architecturally

pgindent is run once per release cycle by Tom Lane ("the annual pgindent"), and its output is effectively canonical — any deviation gets "fixed" on the next pass, so buggy indentation gets baked into the tree and committers can't cleanly override it. Because the tool is also used as a gating check (via src/tools/pgindent), false positives force contributors either to:

  • rename legitimate identifiers to avoid collisions with typedefs (what c4133ec recommends),
  • accept ugly formatting in the tree, or
  • fix the indenter itself.

The indenter's code is notoriously hard to modify — Nathan described it as "basically impenetrable" — which is why these defects accumulate.

Proposed Solutions

Two distinct threads of discussion ran in parallel:

1. The !=TYPEDEF spacing bug (the main issue)

Initially Nathan concluded "the best thing to do here is nothing," given the small number of affected sites and the difficulty of patching the indenter. Five months later he returned with an AI-assisted patch set:

  • 0001 — a pgindent source change that corrects the handling of last_u_d so that a binary operator followed by a typedef-named identifier no longer suppresses the space. This mirrors the logic in the upstream FreeBSD fix afa2239.
  • 0002 — the result of re-running pgindent with 0001 applied, which cleans up exactly the set of sites grep had found (in dsm_registry.c, logicalfuncs.c, and pg_basebackup.c).

Chao Li independently ran the patched pgindent across all .c/.h files under src/ and contrib/ and got byte-identical results to 0002, which is strong evidence the patch has no unintended formatting side-effects. Tom Lane judged the changes "clearly improvements" and raised the broader question of whether PostgreSQL should more systematically pull in post-fork upstream fixes — PostgreSQL's copy of BSD indent has diverged significantly and is missing years of upstream maintenance.

2. The spurious blank line after bare else with a block comment

Chao Li raised a tangential pgindent bug: when an else clause consists of a multi-line comment followed by a single unbraced statement, pgindent inserts a spurious blank line between else and the comment. He could not fix it in the indenter itself.

Tom Lane turned this into a style argument rather than a bug: in his view, any else (or if) body containing both a comment and a statement is morally two lines of logic and should be braced. Thus pgindent's "misbehavior" is, in practice, enforcing preferred style. Chao and Andrew Dunstan agreed; Álvaro Herrera pushed back mildly, noting he deliberately uses mixed styles — short branch unbraced first, long else branch braced — when the asymmetry aids readability.

This sub-discussion is not resolved with a patch; the consensus is to brace such constructs and leave the indenter alone.

Key Technical Insights

  • last_u_d is the locus of most pgindent misparses. The variable tracks whether the previous token left the parser in a state where the next *, &, +, -, or ! should be treated as unary (or as part of a declarator like int *p) versus binary. Typedef names cause the lexer to emit a different token class (decl), which in turn flips last_u_d inappropriately after a binary operator. The FreeBSD commit afa2239 corrects exactly this.
  • Typedef-name collisions are a recurring hazard. The c4133ec README note about function-name collisions, the !=TYPEDEF bug, and the behavior difference when a struct member shadows a typedef are all manifestations of the same architectural limitation: pgindent resolves identifiers against a flat typedef list with no scoping awareness.
  • Divergence from upstream is a latent technical debt. Tom's closing remark — wondering whether to adopt upstream's post-fork changes wholesale — acknowledges that PostgreSQL carries a stale fork of a tool that is still being maintained elsewhere. Each ad-hoc cherry-pick (like 0001) is cheaper in isolation but more expensive over time than a rebase.
  • Validation by re-run is the right QA strategy for formatter changes. Chao's full-tree diff confirming 0002 is the proper smoke test: a pgindent patch is safe iff its formatting delta across the whole tree is exactly the intended set.

Participant Dynamics

  • Nathan Bossart (committer) drove the investigation and ultimately produced the fix. His initial "not worth it" stance softened once he found the upstream analog.
  • Tom Lane (senior committer, de facto owner of pgindent runs) is the gatekeeper here — his approval of 0001 is what would unblock a commit before the next annual pgindent. His style opinions also carry unusual weight because pgindent's output effectively codifies them.
  • Chao Li contributed the independent verification (full-tree re-run match), which is the empirical justification for committing 0001.
  • Andrew Dunstan and Álvaro Herrera (both committers) weighed in only on the brace-style tangent, with Álvaro providing the lone dissent on mixed bracing style.

Open Questions

  1. Should PostgreSQL do a broader sync with upstream FreeBSD indent, or continue cherry-picking?
  2. The c4133ec class of bugs (function-name/typedef collisions) remains unfixed — does the same last_u_d framework extend to those cases?
  3. Timing: the patch landed in the discussion in May, shortly before the annual pgindent run, so getting it in before that run matters for tree-wide effect.