What's New in This Round
This round contains only two messages: Ishii adding pgsql-hackers to the CC (a procedural action with no technical content), and chenloveit's reply which introduces a new architectural argument against the GUC approach — this time from the OP himself, reversing his own earlier design.
chenloveit abandons his own GUC prototype in favor of encoding variants
chenloveit now argues against the GUC-based encoding_validation parameter he himself implemented in the GitHub prototype, citing a concrete pg_dumpall failure scenario:
- Database populated under
encoding_validation = 'native'(permissive) - Cluster dumped via
pg_dumpall - New cluster initialized with
encoding_validation = 'read_compatible'(strict) - Restore fails because previously-accepted bytes are now rejected
This is the standard "dump/restore asymmetry" argument that kills most GUC-gated strictness proposals in PostgreSQL. It's the same class of problem that affects standard_conforming_strings transitions and similar behavioral GUCs.
New proposal: encoding variants rather than configuration
chenloveit's alternative is architecturally distinct: rather than a runtime switch, register new encoding identifiers (e.g., a strict EUC_CN_STRICT or similar variant) as first-class encodings in pg_encoding. This means:
- Validation strictness is a property of the encoding itself, set at
CREATE DATABASEtime and immutable thereafter. - No dump/restore ambiguity: the encoding name in the dump carries the semantics.
- No per-session variability: the database's encoding is its validation contract.
- Coexistence is possible: one database can be
EUC_CN(legacy permissive) while another is the strict variant.
This directly addresses Ishii's objection about per-encoding granularity and the dump/restore problem in one stroke, at the cost of encoding-namespace proliferation and the need to wire up new pg_wchar_tbl entries, conversion procs, etc.
Significance
This is a meaningful design evolution: the thread has now cycled through three mechanism proposals (documentation-only → global GUC → encoding variants), each addressing objections raised against the prior one. The encoding-variant approach is the first that simultaneously satisfies:
- Ishii's "per-encoding, not global" requirement
- The dump/restore consistency requirement
- The backward-compatibility requirement (existing
EUC_CNdatabases unchanged)
However, it introduces its own complications: encoding proliferation, whether ICU/libc locale combinations work with variant names, and how pg_upgrade handles databases that want to switch. No committer has yet reacted to this proposal.