2026-05-27 · claude-opus-4-6

Incremental Update: Tom Lane Rejects Patch Due to Platform Portability Concerns

Summary

Tom Lane has responded to the patch with a clear rejection rationale based on cross-platform locale name variability — the exact concern anticipated in the prior analysis, but now articulated with concrete historical evidence from the buildfarm.

Key Technical Argument

Tom Lane explains that the fundamental reason this test doesn't already exist is locale name spelling and availability varies across platforms. The proposed test hardcodes en_US.UTF-8, which:

May not exist on all buildfarm machines
May produce a different error message than expected (e.g., "locale not found" instead of the encoding mismatch error)

He provides a specific historical parallel: his own attempts to add NLS translation coverage, referencing six commits that document the painful iteration cycle:

8c498479d → 5b275a6e1 → fe7ede45f → 7db6809ce → 84a3778c7 → 462e24765

This series demonstrates that even experienced committers underestimate the portability challenges of locale-dependent tests.

Proposed Alternative (Dismissed)

Tom acknowledges that variant expected-output files (the .out alternate mechanism in pg_regress) could theoretically handle the platform differences, but dismisses this approach as:

"A pain in the rear for maintenance"
Questionable in what it would actually prove — if you need N variant files for N platform behaviors, the test's value as a regression guard diminishes

Implications

This effectively closes the patch unless the author can propose an approach that avoids locale-name dependency. Possible paths forward (not suggested by Tom, but implied by the discussion):

Using TAP tests with platform detection logic
Finding a locale guaranteed to exist (unlikely to exist universally)
Testing via builtin locale provider (PG17+) which doesn't depend on OS locale availability
Testing at a lower level (e.g., check_encoding_locale_matches() unit test)

The tone of the response is definitive — Tom is not asking for a revision but explaining why the approach is fundamentally problematic.

2026-05-25 · claude-opus-4-6

Analysis: Add Regression Test for Mismatched ENCODING and LOCALE in CREATE DATABASE

Core Problem

The PostgreSQL documentation for CREATE DATABASE explicitly states that encoding and locale settings must be compatible, and that an error will be reported if they are not. However, the existing regression test suite lacks coverage for this specific failure mode. This means:

No regression guard exists to ensure the mismatch detection continues working correctly if the underlying validation logic is refactored.
Documentation-code parity is not verified — the documented behavior could theoretically diverge from actual behavior without any test catching it.

Technical Context

How Encoding/Locale Validation Works in PostgreSQL

When CREATE DATABASE is executed, PostgreSQL validates that the specified encoding is compatible with the requested locale. This validation occurs in createdb() (in src/backend/commands/dbcommands.c). The key function involved is check_encoding_locale_matches() which verifies that a locale's implied encoding is compatible with the explicitly requested encoding.

For example, the locale en_US.UTF-8 implies UTF-8 encoding. If a user specifies ENCODING LATIN1 alongside LOCALE 'en_US.UTF-8', these are incompatible because LATIN1 (ISO-8859-1) cannot represent the full character set that en_US.UTF-8 locale operations expect.

The specific error path produces a message like:

ERROR: encoding "LATIN1" does not match locale "en_US.UTF-8"
DETAIL: The chosen LC_CTYPE setting requires encoding "UTF8".

Why This Test Matters Architecturally

While the patch is simple (test-only, no backend changes), it addresses a real gap:

PostgreSQL's locale handling has been significantly refactored in recent versions (particularly with the introduction of ICU as a locale provider and the LOCALE_PROVIDER option in PG15+, and further builtin provider work in PG17).
The LOCALE parameter itself is relatively newer syntactic sugar that sets both LC_COLLATE and LC_CTYPE simultaneously.
As locale infrastructure continues to evolve, having explicit regression coverage for the encoding/locale mismatch error path ensures that refactoring doesn't accidentally remove or break this validation.

Proposed Solution

The patch adds a test case to the regression suite (likely in src/test/regress/sql/ or potentially src/test/regress/expected/) that:

Attempts to create a database with intentionally incompatible settings:

CREATE DATABASE dbtest
    LOCALE 'en_US.UTF-8'
    ENCODING LATIN1
    TEMPLATE template0;

Expects this statement to fail with an appropriate error message.
Uses TEMPLATE template0 because creating a database from a non-default template with different encoding requires template0 (which has no user objects and allows encoding changes).

Design Considerations

Platform dependency: The locale en_US.UTF-8 must be available on the test system. This is a common concern for locale-dependent tests. Most CI environments and standard installations have this locale, but it's not universal.
Test placement: This would logically belong alongside other CREATE DATABASE error-path tests, possibly in the CREATE DATABASE-specific test file or in a collation/encoding test file.
Minimal scope: The patch deliberately avoids changing any backend behavior, making it low-risk for inclusion.

Assessment

This is a straightforward, low-risk patch that fills a documentation-verified test gap. The main potential concern a reviewer might raise is locale availability on all test platforms. A reviewer might suggest using \! locale -a checks or conditional test logic, or might suggest the test be placed in the TAP test infrastructure (src/test/regress/t/ or src/test/modules/) where platform-conditional logic is easier to implement.

The patch is appropriate for the current development cycle and aligns with the project's ongoing effort to improve test coverage, particularly around the increasingly complex locale/encoding infrastructure.

[PATCH] Add regression test for mismatched ENCODING and LOCALE in CREATE DATABASE

Latest Update