Support for 8-byte TOAST values, round two

First seen: 2026-05-08 06:07:13+00:00 · Messages: 1 · Participants: 1

Latest Update

2026-05-09 · opus 4.7

Support for 8-byte TOAST values, round two — Technical Analysis

The Core Problem: TOAST OID Exhaustion

PostgreSQL's TOAST (The Oversized-Attribute Storage Technique) mechanism stores large field values out-of-line in a companion TOAST relation. Each out-of-line value is identified by a 4-byte Oid (chunk_id), allocated from a cluster-wide OID counter with collision checks against the per-TOAST-table unique index on chunk_id.

This 32-bit identifier space has become an operational hazard on large, write-heavy systems:

  1. Collision-driven slowdowns at scale. GetNewOidWithIndex() must loop and probe the TOAST index when the space becomes densely populated. On a single TOAST relation approaching billions of live chunk_ids, insert latency becomes dominated by OID collision retries, and in pathological cases can live-lock.
  2. Hard ceiling. A TOAST relation cannot store more than ~2^32 distinct live chunk_id values. Combined with the fact that every detoasted datum is a function of (toastrelid, chunk_id), this caps the effective lifetime amount of unique TOAST-able data per relation.
  3. Wraparound semantics for OIDs historically skip values below FirstNormalObjectId, further shrinking the usable space and making wraparound-driven collisions more frequent once the counter laps.

The goal of this patch set is to lift that ceiling by allowing a TOAST relation to use a 64-bit identifier (Oid8) instead of a 32-bit Oid, while retaining full binary compatibility for existing clusters that don't opt in.

Why This Is Round Two: Rejection of the Callback Approach

The prior revision (v1–v19, referenced by message aFOnKHG7Wn-Srnpv@paquier.xyz) abstracted over the varlena external pointer variants using function pointer dispatch — a callback per vartag_external. Reviewer feedback was decisive on two points:

Michael Paquier's v20 abandons that design entirely in favor of what he calls the "brutal" approach: open-coded branching on either (a) the vartag_external of an in-memory varlena datum, or (b) the atttype of the TOAST relation's chunk_id column when reading from disk. No function pointers, no vtables — just two concrete shapes handled inline.

The Design

Opt-in per table via reloption

A new toast_value_type reloption takes values oid (default, current behavior) or oid8. Setting this at CREATE TABLE time causes the TOAST relation to be built with chunk_id of type oid8 instead of oid. Crucially:

The Oid8 counter and wraparound

The 64-bit TOAST value ID is allocated from a new counter persisted in pg_control, extended by 4 bytes. Reusing the control file rather than introducing a new SLRU or file avoids crash-recovery complications; pg_control is already fsync'd on checkpoint and replicated via base backup and streaming.

The counter preserves the existing semantics of skipping the [0, FirstNormalObjectId) range on wraparound. Note the subtlety: with 64 bits, wraparound is effectively unreachable at any realistic write rate, but the skip logic is retained for consistency with how OIDs behave and to keep the low range reserved for future sentinel use.

The new varlena external tag

A new vartag_external value is introduced alongside a new on-disk / in-memory struct:

typedef struct varatt_external_oid8
{
    int32   va_rawsize;
    uint32  va_extinfo;
    uint32  va_valueid_lo;
    uint32  va_valueid_hi;
    Oid     va_toastrelid;
} varatt_external_oid8;

Two design points deserve attention:

  1. Split into _lo / _hi uint32 pair instead of a single uint64. This is deliberate to avoid 8-byte alignment padding inside the struct on platforms where uint64 forces 8-byte alignment. The struct as declared is 20 bytes with no internal padding, matching how varatt_external (16 bytes) is treated as a packed on-disk representation via memcpy through VARATT_EXTERNAL_GET_POINTER/SET_POINTER macros. Preserving the pack-without-padding invariant is important because these structs are embedded directly into heap tuples as part of the varlena datum; any padding difference would change the on-disk footprint across architectures.
  2. The tag, not the atttype, drives in-memory dispatch. Once a tuple is loaded, all that's available is the varlena datum itself. The vartag_external byte immediately after the VARTAG_1B_E header tells detoast code which struct shape to read. Conversely, when a backend is creating a new external pointer (in toast_save_datum / heap_toast_insert_or_update), it must consult the TOAST relation's chunk_id atttype to decide which shape to write. This is the two-sided dispatch Paquier refers to.

Layering of the patch set

The patch set is deliberately stacked so that ground-work lands independently of the user-visible feature:

  1. Renames and cleanup of varatt_external references — making the "OID-ness" explicit in identifiers so the eventual oid8 variant reads naturally alongside.
  2. Infrastructure for the Oid8 counter in pg_control, and for carrying the TOAST value type through the catalog / reloption machinery.
  3. Feature patch (last) adds the new vartag_external and the varatt_external_oid8 struct.

An interesting property Paquier highlights: if you apply every patch except the last, an oid8 TOAST table works — but using the existing varatt_external with a truncated identifier. This is not a useful configuration in itself but makes each patch independently reviewable and testable, which matters given the breadth of code touched (detoast, compression, reorderbuffer.c, amcheck).

Touched Subsystems and Why Each Matters

Paquier reports the final feature patch footprint is modest — 9 files, +537 / -208 — which is credible only because the ground-work patches absorbed most of the churn.

Key Tradeoffs and Open Questions

  1. No in-place conversion is the single biggest usability compromise. Users with an existing hot TOAST table approaching the OID ceiling must do a logical migration. This is the right call for a first cut but will likely generate follow-up requests.
  2. Reloption granularity is per-table. There is no cluster-wide default to make every new table oid8. For installations that know they want oid8 everywhere, they'd have to set it at CREATE TABLE time or via a template. A GUC-driven default would be a natural follow-up.
  3. pg_control growth. Adding 4 bytes to pg_control is cheap but is an on-disk format change requiring pg_upgrade handling and a catversion bump. This is routine for major versions.
  4. The "brutal" approach leaves the door open to a third variant. If someone later proposes, say, a variable-length value ID, they'd have to add yet another vartag_external and another open-coded branch. The callback approach would have absorbed new variants more gracefully — a point reviewers may revisit, though Paquier's performance argument remains strong.

Weight of the Proposer

Michael Paquier is a longtime PostgreSQL committer with deep familiarity with the storage, TOAST, and pg_control subsystems. The fact that this is a determined v20 resubmission after a round of pushback, with the design substantially rethought rather than merely patched, signals both personal investment and technical seriousness. The mention of pgconf.dev discussion and explicit targeting of v20 (the PostgreSQL major version) indicates this is being staged for a full release cycle of review.

Status at This Point in the Thread

Only the initial post exists in this excerpt. No reviewer responses, no benchmark numbers, no committer pushback on the new design are yet present. The real technical debate — whether the "brutal" dispatch remains clean as more call sites are touched, whether pg_control is the right home for the counter, and whether in-place conversion can be bolted on later — will play out in subsequent messages.