Add a guard against uninitialized default locale

First seen: 2026-04-24 22:44:23+00:00 · Messages: 3 · Participants: 2

Latest Update

2026-05-18 · claude-opus-4-6

Technical Analysis: Add a Guard Against Uninitialized Default Locale

The Core Problem

This thread addresses a defensive programming concern in PostgreSQL's locale/collation infrastructure: the possibility that code could dereference the default database locale before it has been initialized, resulting in a NULL pointer dereference crash.

Background: The Locale Initialization Path

PostgreSQL's collation system maintains a default database locale that is initialized during backend startup via init_database_collation(). This locale object (associated with DEFAULT_COLLATION_OID) is critical — it's the collation used for most string operations when no explicit collation is specified. The locale providers ('b' for builtin, 'c' for libc, 'i' for ICU, 'd' for the older default) are encoded as single characters, with a valid provider always being nonzero.

The commit dbf217c1c7 (referenced in the thread) apparently fixed a specific reachable code path where the default locale could be accessed before initialization. However, Jeff Davis identifies that the general vulnerability class remains: any code that calls pg_newlocale_from_collation(DEFAULT_COLLATION_OID) before init_database_collation() has run could trigger a NULL pointer dereference.

The Extension Attack Vector

The most concrete scenario Davis identifies is an extension calling pg_newlocale_from_collation(DEFAULT_COLLATION_OID) from its _PG_init() function. Extension _PG_init() functions run during shared_preload_libraries processing, which occurs very early in backend/postmaster startup — potentially before the database collation has been fully initialized. This is a realistic scenario: an extension performing string comparison or pattern matching operations during initialization could easily trigger this path.

The Proposed Solution

The patch adds guard checks at the entry points that consume the default locale, specifically in three functions:

  1. lc_collate_is_c() — checks if the collation locale is C/POSIX for collation purposes
  2. lc_ctype_is_c() — checks if the collation locale is C/POSIX for character classification
  3. pg_newlocale_from_collation() — the main function that resolves a collation OID to a locale object

The sentinel used is the provider field being '\0' (the null character). This is a clean sentinel because all valid locale providers are nonzero ASCII characters ('b', 'c', 'i', 'd'), so a zero provider unambiguously indicates that initialization hasn't occurred yet. When this sentinel is detected, the functions raise an appropriate error rather than proceeding to dereference NULL.

Version Differences: HEAD vs. PG 17

Davis notes the patch differs between HEAD and the PG 17 backport. This is expected because the collation infrastructure underwent significant refactoring in recent major versions (particularly around the builtin locale provider and the restructuring of pg_locale_t). The PG 17 version likely checks the provider sentinel in a slightly different code structure, but the defensive logic is the same.

The backport target of PG 17 (but not earlier) suggests that the relevant locale infrastructure refactoring that introduced this vulnerability pattern was part of the PG 17 development cycle, likely related to the builtin collation provider work that Davis himself led.

Design Decisions and Tradeoffs

Why a Guard Rather Than Reordering Initialization?

The choice to add runtime guards rather than ensuring initialization always precedes any possible consumer is a deliberate defense-in-depth strategy. Reordering startup to guarantee initialization happens first would be fragile — it can't account for extension code in _PG_init() or future code reorganization. The guard approach is robust against unknown future callers.

The '\0' Sentinel

Using the zero byte as a sentinel for "uninitialized" is elegant: it relies on the fact that C zero-initializes static/global data, so the provider field will naturally be '\0' before any explicit initialization. This avoids needing a separate boolean is_initialized flag, keeping the structure lean. Ayush's review comment about the "magical" nature of this check is valid from a readability perspective — the connection between '\0' and "not yet initialized" is implicit and relies on understanding both the valid provider values and C initialization semantics.

Comment Style Consistency

Davis appears to prefer a uniform /* should not happen */ comment across all three guarded sites and across versions. Ayush suggests a more descriptive comment for the PG 17 backport specifically, but Davis's preference for consistency has merit: it signals that the guard is for the same class of problem everywhere, and avoids comments that could become stale if initialization function names change.

Assessment

This is a low-risk, high-value defensive patch. It:

The patch is clean and the review feedback is minor (comment wording only). This is the kind of hardening work that prevents mysterious crash reports from extension authors months or years later.