Use ereport() instead of elog() for invalid weights in setweight()

First seen: 2026-06-03 15:39:05+00:00 · Messages: 2 · Participants: 2

Latest Update

2026-06-04 · claude-opus-4-6

Technical Analysis: Use ereport() instead of elog() for invalid weights in setweight()

Core Problem

The setweight() function in PostgreSQL's full-text search subsystem (tsvector_op.c) uses elog(ERROR, ...) to report invalid weight arguments, which produces an internal error with SQLSTATE XX000 (internal_error). This is semantically incorrect because the weight parameter comes directly from user input, not from an internal programming invariant violation. The distinction matters for:

  1. Client error handling: Applications that catch errors by SQLSTATE class cannot distinguish a genuine internal bug (XX000) from a simple invalid-parameter error. ERRCODE_INVALID_PARAMETER_VALUE (22023) is the correct classification.
  2. Error message quality: The existing elog() call prints the invalid weight character as a raw ASCII integer (e.g., "unrecognized weight: 112" for 'p'), which is unhelpful to end users.
  3. Consistency: The adjacent function ts_filter() in the same source file already correctly uses ereport() with ERRCODE_INVALID_PARAMETER_VALUE for the equivalent validation, making the inconsistency an obvious oversight.

This falls into a broader ongoing effort in PostgreSQL to audit and fix places where user-reachable error paths incorrectly use internal error codes—a cleanup initiative referenced by the thread as having been previously discussed in the context of Michaël Paquier's work.

Affected Code Paths

Three specific error paths in tsvector_op.c are affected:

  1. tsvector_setweight() (two-argument form): The standard setweight(tsvector, "char") function that assigns a weight label to all lexemes.
  2. tsvector_setweight_by_filter(): The three-argument variant setweight(tsvector, "char", text[]) that assigns weight only to specified lexemes.
  3. Weight validation in ts_filter(): Already correct but relevant as a model for the fix.

The tsvector weight system supports four labels: A, B, C, D (case-insensitive). Any other character value should produce a clear user-facing error.

Proposed Solutions

v1 (Ewan Young)

A straightforward patch that:

v2 (Tom Lane)

A more architecturally considered revision that:

Key Design Tradeoff: Encoding Safety vs. Message Clarity

The most interesting technical insight in this thread is Tom Lane's observation about why the original code printed weights numerically. It wasn't mere laziness—it was a (poorly documented) defense against encoding corruption:

The existing ts_filter() code that was being used as a model for the fix was actually also vulnerable to this hazard—it prints the character directly without checking for non-ASCII values. Tom Lane's v2 addresses this latent bug as well.

Architectural Implications

The refactoring into parse_weight represents a minor but well-motivated application of DRY (Don't Repeat Yourself) principles in PostgreSQL's codebase. The function likely:

This is consistent with PostgreSQL's general evolution toward cleaner internal APIs and away from ad-hoc inline validation.