splitting pg_resetwal output strings

First seen: 2026-01-31 09:41:06+00:00 · Messages: 11 · Participants: 4

Latest Update

2026-05-25 · claude-opus-4-6

Splitting pg_resetwal Output Strings: Dynamic Column Alignment for Translatable Output

Core Problem

PostgreSQL's pg_resetwal (and similar tools like pg_controldata) produce tabular output where descriptive labels are followed by values, aligned in two columns. The current implementation uses hard-coded whitespace padding within translatable strings to achieve visual alignment. This creates a chronic maintenance burden for translators:

  1. Fragile alignment: Translators must manually count and insert whitespace to keep columns aligned after translation.
  2. Cascade failures: When a new untranslated string is added, it disrupts the alignment of the entire output block because its English text has different padding assumptions than the translated strings.
  3. Per-language width variance: Some languages (e.g., Spanish) produce longer label strings, requiring the translator to widen all lines — meaning every addition of a new line forces re-editing every existing translation.

This is fundamentally an architectural problem: presentation logic (column alignment) is entangled with translatable content, violating separation of concerns.

Proposed Solution

Álvaro Herrera proposes a runtime column-width calculation approach:

Architecture

  1. Declarative string registry (entries.h): A header file using X-macro patterns defines all control data output lines via a CONTROLDATA_LINE(symbol, description, fmt, ...) macro. Each entry specifies:

    • A symbolic identifier (enum member)
    • The translatable description string
    • A printf format specifier
    • The actual data expression to print
  2. Two-pass rendering:

    • Pass 1: Include entries.h with a macro definition that measures the wcswidth of each translated string, tracking the maximum width.
    • Pass 2: Include entries.h again with a macro that prints each line, padding the description to maxlen characters before printing the value.
  3. Selective printing via simple_oid_list: For PrintNewControlValues() (which only prints changed values), the code builds a list of symbolic identifiers to print, then the macro expansion includes an if (simple_oid_list_member(&toprint, symbol)) guard.

  4. internal_wcswidth() function: A new function added to avoid linking libpq just for pg_wcswidth(). This handles multibyte character width calculation needed for proper alignment with translated strings containing wide characters (CJK, etc.).

Key Design Decisions

Technical Insights

Why Not Use Oid for Enum Values?

Álvaro explicitly rejects naming the enum ControlDataOid (as Evan Chao suggested), reasoning that:

Multibyte Width Handling

The patch correctly uses character-width measurement (wcswidth equivalent) rather than byte length. This is critical because:

Font Width Disclaimer

Jonathan Abdiel raised the issue of variable-width fonts affecting alignment when mixing scripts (e.g., Hindi + English). Álvaro correctly dismisses this: the system measures width in monospace terminal units per Unicode standards. If a user's terminal uses proportional fonts or incorrectly-sized glyphs, that's outside PostgreSQL's control.

Gettext Integration Issues (Patch Bug)

Two bugs were identified in the gettext/translation pipeline:

  1. GETTEXT_TRIGGERS position: Patch 0001 sets CONTROLDATA_LINE:1 (first argument is the translatable string), but Patch 0002 shifts the string to position 2 (adding the symbol as argument 1). The trigger must be updated to CONTROLDATA_LINE:2.
  2. Standalone strings not picked up: Strings like "First log segment after reset" defined as char *str = "..." outside the macro system aren't caught by make update-po. They need gettext_noop() wrapping so msgmerge picks them up without immediate translation.

Broader Implications

Peter Eisentraut (committer) identifies that this technique generalizes to --help output and other two-column displays across all PostgreSQL tools. The proposed phased approach:

This is a sensible incremental strategy that limits blast radius while proving the approach works correctly across platforms, encodings, and translation states.

Remaining Work

  1. Add simple_int_list (or justify keeping simple_oid_list)
  2. Fix the GETTEXT_TRIGGERS argument position in nls.mk
  3. Wrap standalone strings in gettext_noop()
  4. Deduplicate the measurement/printing logic into a shared routine
  5. Potentially rename entries.h to something more descriptive