2026-05-25 · claude-opus-4-6

Splitting pg_resetwal Output Strings: Dynamic Column Alignment for Translatable Output

Core Problem

PostgreSQL's pg_resetwal (and similar tools like pg_controldata) produce tabular output where descriptive labels are followed by values, aligned in two columns. The current implementation uses hard-coded whitespace padding within translatable strings to achieve visual alignment. This creates a chronic maintenance burden for translators:

Fragile alignment: Translators must manually count and insert whitespace to keep columns aligned after translation.
Cascade failures: When a new untranslated string is added, it disrupts the alignment of the entire output block because its English text has different padding assumptions than the translated strings.
Per-language width variance: Some languages (e.g., Spanish) produce longer label strings, requiring the translator to widen all lines — meaning every addition of a new line forces re-editing every existing translation.

This is fundamentally an architectural problem: presentation logic (column alignment) is entangled with translatable content, violating separation of concerns.

Proposed Solution

Álvaro Herrera proposes a runtime column-width calculation approach:

Architecture

Declarative string registry (entries.h): A header file using X-macro patterns defines all control data output lines via a CONTROLDATA_LINE(symbol, description, fmt, ...) macro. Each entry specifies:
- A symbolic identifier (enum member)
- The translatable description string
- A printf format specifier
- The actual data expression to print
Two-pass rendering:
- Pass 1: Include entries.h with a macro definition that measures the wcswidth of each translated string, tracking the maximum width.
- Pass 2: Include entries.h again with a macro that prints each line, padding the description to maxlen characters before printing the value.
Selective printing via simple_oid_list: For PrintNewControlValues() (which only prints changed values), the code builds a list of symbolic identifiers to print, then the macro expansion includes an if (simple_oid_list_member(&toprint, symbol)) guard.
internal_wcswidth() function: A new function added to avoid linking libpq just for pg_wcswidth(). This handles multibyte character width calculation needed for proper alignment with translated strings containing wide characters (CJK, etc.).

Key Design Decisions

X-macro pattern: Using #include "entries.h" with redefinable macros is an established C technique for generating parallel data structures (enum, measurement loop, print loop) from a single source of truth.
Enum for symbolic names: Patch 0002 adds symbolic identifiers (e.g., CD_CONTROL_VERSION) enabling selective printing without relying on array indices.
Reuse of simple_oid_list: A pragmatic but acknowledged-as-imperfect choice for storing which entries to print. Álvaro notes that a proper simple_int_list would be more type-correct.

Technical Insights

Why Not Use Oid for Enum Values?

Álvaro explicitly rejects naming the enum ControlDataOid (as Evan Chao suggested), reasoning that:

Oid is semantically tied to database catalog objects
If PostgreSQL ever enlarges Oid to 64-bit (for TOAST pointer reasons), the enum values (which C compilers typically implement as 32-bit integers) would diverge
Using simple_int_list maintains type-correctness by definition: C enums are C integers, and int_list stores C integers

Multibyte Width Handling

The patch correctly uses character-width measurement (wcswidth equivalent) rather than byte length. This is critical because:

UTF-8 characters may be 1-4 bytes but display as 1-2 columns
CJK characters are typically 2 columns wide
The alignment must work in terminal units (columns), not bytes

Font Width Disclaimer

Jonathan Abdiel raised the issue of variable-width fonts affecting alignment when mixing scripts (e.g., Hindi + English). Álvaro correctly dismisses this: the system measures width in monospace terminal units per Unicode standards. If a user's terminal uses proportional fonts or incorrectly-sized glyphs, that's outside PostgreSQL's control.

Gettext Integration Issues (Patch Bug)

Two bugs were identified in the gettext/translation pipeline:

GETTEXT_TRIGGERS position: Patch 0001 sets CONTROLDATA_LINE:1 (first argument is the translatable string), but Patch 0002 shifts the string to position 2 (adding the symbol as argument 1). The trigger must be updated to CONTROLDATA_LINE:2.
Standalone strings not picked up: Strings like "First log segment after reset" defined as char *str = "..." outside the macro system aren't caught by make update-po. They need gettext_noop() wrapping so msgmerge picks them up without immediate translation.

Broader Implications

Peter Eisentraut (committer) identifies that this technique generalizes to --help output and other two-column displays across all PostgreSQL tools. The proposed phased approach:

PG 19: Commit the pg_resetwal-specific implementation
PG 20: Generalize the technique (likely into src/common/) and apply to --help, pg_controldata, and other tools

This is a sensible incremental strategy that limits blast radius while proving the approach works correctly across platforms, encodings, and translation states.

Remaining Work

Add simple_int_list (or justify keeping simple_oid_list)
Fix the GETTEXT_TRIGGERS argument position in nls.mk
Wrap standalone strings in gettext_noop()
Deduplicate the measurement/printing logic into a shared routine
Potentially rename entries.h to something more descriptive

splitting pg_resetwal output strings

Latest Update