[PATCH] Add regression test for mismatched ENCODING and LOCALE in CREATE DATABASE

First seen: 2026-05-23 12:10:26+00:00 · Messages: 1 · Participants: 1

Latest Update

2026-05-25 · claude-opus-4-6

Analysis: Add Regression Test for Mismatched ENCODING and LOCALE in CREATE DATABASE

Core Problem

The PostgreSQL documentation for CREATE DATABASE explicitly states that encoding and locale settings must be compatible, and that an error will be reported if they are not. However, the existing regression test suite lacks coverage for this specific failure mode. This means:

  1. No regression guard exists to ensure the mismatch detection continues working correctly if the underlying validation logic is refactored.
  2. Documentation-code parity is not verified — the documented behavior could theoretically diverge from actual behavior without any test catching it.

Technical Context

How Encoding/Locale Validation Works in PostgreSQL

When CREATE DATABASE is executed, PostgreSQL validates that the specified encoding is compatible with the requested locale. This validation occurs in createdb() (in src/backend/commands/dbcommands.c). The key function involved is check_encoding_locale_matches() which verifies that a locale's implied encoding is compatible with the explicitly requested encoding.

For example, the locale en_US.UTF-8 implies UTF-8 encoding. If a user specifies ENCODING LATIN1 alongside LOCALE 'en_US.UTF-8', these are incompatible because LATIN1 (ISO-8859-1) cannot represent the full character set that en_US.UTF-8 locale operations expect.

The specific error path produces a message like:

ERROR: encoding "LATIN1" does not match locale "en_US.UTF-8"
DETAIL: The chosen LC_CTYPE setting requires encoding "UTF8".

Why This Test Matters Architecturally

While the patch is simple (test-only, no backend changes), it addresses a real gap:

Proposed Solution

The patch adds a test case to the regression suite (likely in src/test/regress/sql/ or potentially src/test/regress/expected/) that:

  1. Attempts to create a database with intentionally incompatible settings:
    CREATE DATABASE dbtest
        LOCALE 'en_US.UTF-8'
        ENCODING LATIN1
        TEMPLATE template0;
    
  2. Expects this statement to fail with an appropriate error message.
  3. Uses TEMPLATE template0 because creating a database from a non-default template with different encoding requires template0 (which has no user objects and allows encoding changes).

Design Considerations

Assessment

This is a straightforward, low-risk patch that fills a documentation-verified test gap. The main potential concern a reviewer might raise is locale availability on all test platforms. A reviewer might suggest using \! locale -a checks or conditional test logic, or might suggest the test be placed in the TAP test infrastructure (src/test/regress/t/ or src/test/modules/) where platform-conditional logic is easier to implement.

The patch is appropriate for the current development cycle and aligns with the project's ongoing effort to improve test coverage, particularly around the increasingly complex locale/encoding infrastructure.