Patch for migration of the pg_commit_ts directory

First seen: 2025-04-03 18:27:32+00:00 · Messages: 49 · Participants: 6

Latest Update

2026-05-14 · claude-opus-4-6

Technical Analysis: Migration of pg_commit_ts Directory During pg_upgrade

Core Problem

PostgreSQL's pg_upgrade utility does not migrate the pg_commit_ts directory when upgrading a cluster. This directory stores commit timestamps for transactions when track_commit_timestamp is enabled — data that maps transaction IDs (XIDs) to their commit timestamps and replication origin IDs. Users who rely on this feature lose all commit timestamp history upon every major version upgrade, forcing them to manually copy the directory and run pg_resetwal — a fragile, undocumented workaround.

Architectural Context

The pg_commit_ts subsystem (introduced in PostgreSQL 9.5) stores fixed-size records mapping XIDs to (timestamp, replication_origin_id) tuples in SLRU-managed segment files under pg_commit_ts/. The control file tracks oldestCommitTsXid and newestCommitTsXid to bound the valid range.

During pg_upgrade, the tool uses pg_resetwal to set these control file fields in the new cluster. Previously, both were set to chkpnt_nxtxid (effectively invalidating all old commit timestamp data). The patch modifies this to preserve the old cluster's actual oldest/newest values when the data is migrated.

Proposed Solution (Evolution)

Initial Approach (v1-v7)

  1. Copy pg_commit_ts directory from old to new cluster during upgrade
  2. Pass correct oldestCommitTsXid/newestCommitTsXid to pg_resetwal instead of zeroing them out
  3. Check track_commit_timestamp setting on the new cluster — error out if disabled but old cluster has commit_ts data

Key Technical Changes

Critical Complication: Replication Origin ID Mismatch

Masahiko Sawada identified a fundamental flaw: the commit timestamp records store roident (replication origin ID), but pg_upgrade reassigns OIDs to subscriptions, causing origin IDs to change. This means:

This is not merely a cosmetic issue — it can cause incorrect conflict detection behavior that defeats one of the primary use cases for migrating commit timestamps in the first place.

Later Approaches (v12-v14)

Key Design Decisions & Disagreements

Error vs. Warning on Parameter Mismatch

Mandatory vs. Optional Migration

How to Detect New Cluster Configuration

Use Cases Established

  1. Audit/forensics: Knowing when any record was last modified (patch author's primary use case — 2-year retention for detecting application errors)
  2. Logical replication conflict detection: update_origin_differs and delete_origin_differs conflicts rely on commit timestamps surviving upgrade
  3. External system integration: Pulling records modified within a time window

Hayato Kuroda demonstrated use case #2 works correctly (when origin IDs are stable) through a detailed pub-sub upgrade test.

Current Status

The patch is at v14 with unresolved architectural concerns:

Technical Subtleties

  1. Version gating: Commit timestamps only exist in PG ≥ 9.5, so the patch gates on major_version >= 905
  2. Zero initialization: Global ClusterInfo structs are zero-initialized, making explicit zeroing of commit_ts fields unnecessary (Kuroda noted this)
  3. Boolean GUC representation: pg_settings.setting always shows "on"/"off" for booleans regardless of how the user specified the value (true/yes/1 all normalize to "on")
  4. SLRU segment files: The copy is a simple file-level copy since SLRU segments are self-contained and XID-indexed