Use TEXT_DUMP_HEADER macro instead of hard-coded string in pg_backup_archiver.c

First seen: 2026-06-03 07:58:01+00:00 · Messages: 3 · Participants: 2

Latest Update

2026-06-04 · claude-opus-4-6

Technical Analysis: Use TEXT_DUMP_HEADER Macro Instead of Hard-Coded String in pg_backup_archiver.c

Core Problem

In PostgreSQL's backup/restore infrastructure (pg_backup_archiver.c), the function RestoreArchive() emits the dump header using a hard-coded string literal:

ahprintf(AH, "--\n-- PostgreSQL database dump\n--\n\n");

The same file already defines a macro TEXT_DUMP_HEADER with identical content. This macro is used elsewhere (notably in _discoverArchiveFormat() to detect whether a file is a text-format dump), but the emission path in RestoreArchive() bypasses it and uses the raw string. This is a classic DRY (Don't Repeat Yourself) violation — if the header text ever changes, both locations would need to be updated in lockstep, and a mismatch could cause format detection failures.

Architectural Context

PostgreSQL's pg_dump/pg_restore tooling supports multiple archive formats (custom, directory, tar, and plain text). The text format is identified by its header string. The detection logic in _discoverArchiveFormat() uses TEXT_DUMP_HEADER to recognize text-format dumps. If RestoreArchive() emits a header that differs from what the detection code looks for, round-tripping (dump → restore → re-dump) could silently break format auto-detection.

The macro definitions relevant here:

Proposed Solution

v1 (initial patch): Simple substitution of the hard-coded string in RestoreArchive() with the TEXT_DUMP_HEADER macro, keeping the macro definition in pg_backup_archiver.c.

v2 (after review feedback): Move the macro definitions (TEXT_DUMP_HEADER and likely TEXT_DUMPALL_HEADER) to pg_backup_archiver.h so they can be shared across translation units. This addresses the related inconsistency in pg_dumpall.c, which also has a hard-coded version of the dumpall header string.

Design Decisions and Tradeoffs

  1. Scope of change: The initial patch was deliberately conservative — only replacing the one hard-coded instance within the same file. The reviewer suggested a broader approach (moving to the header file), which the author adopted.

  2. Cross-file consistency: pg_dumpall.c contains its own hard-coded TEXT_DUMPALL_HEADER equivalent. The initial patch noted this but left it untouched as it spans multiple files. The v2 patch, by moving definitions to the header, enables (but may not yet implement) cleanup in pg_dumpall.c as well.

  3. Risk assessment: This is an extremely low-risk change — it's purely a refactoring with no behavioral change. The string content is identical; only the source of truth is consolidated.

Key Technical Insight

The real value here is not the one-line fix but the principle it enforces: format-detection magic strings and format-emission strings must be the same symbol. PostgreSQL's archive format auto-detection is based on reading the first few bytes/lines of a dump file. If the emitted header drifts from the detected header, pg_restore would fail to auto-detect the format, forcing users to specify -F manually. Using a single macro for both paths makes such drift impossible at compile time.

Assessment

This is a straightforward code quality improvement typical of the PostgreSQL project's high standards for internal consistency. It requires minimal review effort and has zero risk of regression. The reviewer's suggestion to promote the macros to the header file is the architecturally correct evolution, enabling future cleanup of similar hard-coded strings in pg_dumpall.c.