Init connection time grows quadratically

First seen: 2025-05-27 10:45:31+00:00 · Messages: 7 · Participants: 3

Latest Update

2026-06-04 · claude-opus-4-6

Init Connection Time Grows Quadratically: Technical Analysis

Core Problem

The thread investigates a surprising performance characteristic: PostgreSQL's initial connection time (ICT) — the time from pgbench process start until all N connections are established — grows quadratically (O(n²)) with the number of clients, rather than the expected linear O(n) scaling.

The measurements on REL_16_STABLE demonstrate this clearly:

The curve fits y ≈ 0.0002x², indicating that each additional connection becomes progressively more expensive as total connections increase.

Architectural Context

The PGPROC Array and ProcArrayAdd

PostgreSQL maintains a shared-memory array of PGPROC structures — one per backend process. Each PGPROC is a relatively large structure containing transaction state, lock information, and various metadata. When a new backend connects, ProcArrayAdd() is called to register it in the global process array.

Critically, each PGPROC has a field pgxactoff which stores the backend's offset in the ProcArray. During ProcArrayAdd, the code must access:

allProcs[procno].pgxactoff = index;

This seemingly simple assignment becomes pathological at scale because:

  1. PGPROC structures are large (~hundreds of bytes each), meaning they span many memory pages
  2. Process numbers (procno) are assigned non-sequentially after initial warmup — backends that disconnect and reconnect get reused slots scattered across the array
  3. With huge_pages=off, the standard 4KB page size means the PGPROC array spans thousands of pages

The Degradation Pattern

Maksim Melnikov's investigation reveals a crucial secondary finding: ICT degrades across repeated pgbench iterations without server restart, eventually stabilizing at worse values than the first run. This is a classic symptom of TLB thrashing and page fault amplification:

  1. First iteration: PGPROC slots are allocated sequentially, so pgxactoff writes hit consecutive memory pages — good spatial locality
  2. Subsequent iterations: After backends disconnect and reconnect, PGPROC slot reuse creates a random access pattern across the array
  3. Minor page faults: Each access to a scattered allProcs[procno].pgxactoff triggers a minor page fault because the OS page table entries for those pages may have been evicted from the TLB

The perf data confirms that ProcArrayAdd is dominated by minor page fault overhead on the allProcs[procno].pgxactoff = index line.

Proposed Solution: Separate pgxactoff Array

The patch (0001-This-patch-reduce-connection-init-close-time.patch) extracts the pgxactoff field from the PGPROC structure into a separate dense shared-memory array indexed by process number:

ProcGlobal->pgxactoffs[procno] = index;  // instead of allProcs[procno].pgxactoff

Why This Helps

Results

Clients Without Patch (warmup) With Patch (warmup) Improvement
512 ~500ms ~215ms 2.3x
1024 ~1000ms ~920ms 1.1x
2048 ~2240ms ~1800ms 1.2x
4096 ~6140ms ~3740ms 1.6x
8192 ~18840ms ~8100ms 2.3x

Crucially, the patch eliminates the degradation between iterations — performance remains stable across runs rather than worsening.

Key Technical Debate

Huge Pages Question

Matthias van de Meent raises the critical observation that this entire investigation uses huge_pages=off, and PostgreSQL is generally not optimized for small-page configurations. With huge pages (2MB pages on x86-64):

The question of whether this optimization matters in production (where huge_pages=on is recommended) remains open — no data with huge pages has been presented.

Measurement Methodology Concerns

Matthias initially questions whether the quadratic behavior is actually in PostgreSQL or in pgbench itself:

Patch Review Issues

Matthias identifies several technical issues with the patch:

  1. Alignment: The pgxactoffs array allocation doesn't account for alignment requirements when TotalProcs * sizeof(statusFlags) isn't a multiple of sizeof(int)
  2. Indirection cost: Previously pgxactoff was a direct offset from the PGPROC pointer; now it requires a separate pointer dereference through ProcGlobal->pgxactoffs
  3. API design: Suggests macro definitions like ProcGetXactOff(procno) and ProcGetMyXactOff() to avoid redundant procno-from-PGPROC calculations
  4. Code style: The add_size/mul_size pattern should be used for shared memory size calculations

The Fast Connection Rate Patch

An auxiliary issue surfaces: at very high connection rates (many thousands of connections on a fast multi-core server), the kernel's socket backlog can overflow, producing "Resource temporarily unavailable" errors. The 0001-Fix-fast-connection-rate-issue.patch works around this by adjusting kernel parameters and/or pgbench behavior, though this patch is not the focus of discussion.

Open Questions

  1. Does the quadratic behavior persist with huge_pages=on? This is the critical missing data point
  2. What is the regression cost of the additional indirection for pgxactoff access in hot paths like GetSnapshotData()?
  3. Is the root cause actually in ProcArrayAdd, or elsewhere? The connection path involves fork(), shared memory attachment, catalog access, and authentication — all of which could have O(n) components
  4. Would a connection pooler (PgBouncer, built-in) make this moot for real workloads? 16384 direct connections is extreme for production