Splitting pg_resetwal Output Strings: Dynamic Column Alignment for Translatable Output
Core Problem
PostgreSQL's pg_resetwal (and similar tools like pg_controldata) produce tabular output where descriptive labels are followed by values, aligned in two columns. The current implementation uses hard-coded whitespace padding within translatable strings to achieve visual alignment. This creates a chronic maintenance burden for translators:
- Fragile alignment: Translators must manually count and insert whitespace to keep columns aligned after translation.
- Cascade failures: When a new untranslated string is added, it disrupts the alignment of the entire output block because its English text has different padding assumptions than the translated strings.
- Per-language width variance: Some languages (e.g., Spanish) produce longer label strings, requiring the translator to widen all lines — meaning every addition of a new line forces re-editing every existing translation.
This is fundamentally an architectural problem: presentation logic (column alignment) is entangled with translatable content, violating separation of concerns.
Proposed Solution
Álvaro Herrera proposes a runtime column-width calculation approach:
Architecture
-
Declarative string registry (
entries.h): A header file using X-macro patterns defines all control data output lines via aCONTROLDATA_LINE(symbol, description, fmt, ...)macro. Each entry specifies:- A symbolic identifier (enum member)
- The translatable description string
- A printf format specifier
- The actual data expression to print
-
Two-pass rendering:
- Pass 1: Include
entries.hwith a macro definition that measures thewcswidthof each translated string, tracking the maximum width. - Pass 2: Include
entries.hagain with a macro that prints each line, padding the description tomaxlencharacters before printing the value.
- Pass 1: Include
-
Selective printing via
simple_oid_list: ForPrintNewControlValues()(which only prints changed values), the code builds a list of symbolic identifiers to print, then the macro expansion includes anif (simple_oid_list_member(&toprint, symbol))guard. -
internal_wcswidth()function: A new function added to avoid linkinglibpqjust forpg_wcswidth(). This handles multibyte character width calculation needed for proper alignment with translated strings containing wide characters (CJK, etc.).
Key Design Decisions
- X-macro pattern: Using
#include "entries.h"with redefinable macros is an established C technique for generating parallel data structures (enum, measurement loop, print loop) from a single source of truth. - Enum for symbolic names: Patch 0002 adds symbolic identifiers (e.g.,
CD_CONTROL_VERSION) enabling selective printing without relying on array indices. - Reuse of
simple_oid_list: A pragmatic but acknowledged-as-imperfect choice for storing which entries to print. Álvaro notes that a propersimple_int_listwould be more type-correct.
Technical Insights
Why Not Use Oid for Enum Values?
Álvaro explicitly rejects naming the enum ControlDataOid (as Evan Chao suggested), reasoning that:
Oidis semantically tied to database catalog objects- If PostgreSQL ever enlarges
Oidto 64-bit (for TOAST pointer reasons), the enum values (which C compilers typically implement as 32-bit integers) would diverge - Using
simple_int_listmaintains type-correctness by definition: C enums are C integers, andint_liststores C integers
Multibyte Width Handling
The patch correctly uses character-width measurement (wcswidth equivalent) rather than byte length. This is critical because:
- UTF-8 characters may be 1-4 bytes but display as 1-2 columns
- CJK characters are typically 2 columns wide
- The alignment must work in terminal units (columns), not bytes
Font Width Disclaimer
Jonathan Abdiel raised the issue of variable-width fonts affecting alignment when mixing scripts (e.g., Hindi + English). Álvaro correctly dismisses this: the system measures width in monospace terminal units per Unicode standards. If a user's terminal uses proportional fonts or incorrectly-sized glyphs, that's outside PostgreSQL's control.
Gettext Integration Issues (Patch Bug)
Two bugs were identified in the gettext/translation pipeline:
- GETTEXT_TRIGGERS position: Patch 0001 sets
CONTROLDATA_LINE:1(first argument is the translatable string), but Patch 0002 shifts the string to position 2 (adding the symbol as argument 1). The trigger must be updated toCONTROLDATA_LINE:2. - Standalone strings not picked up: Strings like
"First log segment after reset"defined aschar *str = "..."outside the macro system aren't caught bymake update-po. They needgettext_noop()wrapping so msgmerge picks them up without immediate translation.
Broader Implications
Peter Eisentraut (committer) identifies that this technique generalizes to --help output and other two-column displays across all PostgreSQL tools. The proposed phased approach:
- PG 19: Commit the pg_resetwal-specific implementation
- PG 20: Generalize the technique (likely into
src/common/) and apply to--help,pg_controldata, and other tools
This is a sensible incremental strategy that limits blast radius while proving the approach works correctly across platforms, encodings, and translation states.
Remaining Work
- Add
simple_int_list(or justify keepingsimple_oid_list) - Fix the
GETTEXT_TRIGGERSargument position in nls.mk - Wrap standalone strings in
gettext_noop() - Deduplicate the measurement/printing logic into a shared routine
- Potentially rename
entries.hto something more descriptive