Centralised architecture detection

First seen: 2026-04-09 01:01:41+00:00 · Messages: 8 · Participants: 5

Latest Update

2026-06-04 · claude-opus-4-6

Centralised Architecture Detection in PostgreSQL

The Core Problem

PostgreSQL's codebase relies on compiler-defined preprocessor macros to detect CPU architecture and conditionally compile architecture-specific optimizations. However, different compilers define different macros for the same architecture:

The detection logic is scattered across multiple files, and critically, several code paths fail to detect x86 on MSVC/Visual Studio builds. This has concrete, measurable performance consequences:

Real Performance Impact on Windows/MSVC Builds

  1. pg_{read,write}_barrier() falls back to pg_memory_barrier() — On x86, read and write barriers are no-ops (due to x86's strong memory ordering: stores are ordered w.r.t. stores, loads w.r.t. loads). Falling back to a full memory barrier introduces unnecessary mfence or lock instructions in hot paths involving spinlocks and lightweight locks.

  2. pg_spin_delay() compiles to nothing — On x86, this should emit a PAUSE instruction, which hints to the CPU that a spin-wait loop is executing, improving performance on hyperthreaded cores by reducing pipeline flush penalties and yielding execution resources to the sibling hyperthread.

  3. PG_HAVE_8BYTE_SINGLE_COPY_ATOMICITY is undefined — On x86-64, aligned 8-byte reads and writes are guaranteed atomic by the architecture. Without this macro, PostgreSQL must use more expensive atomic operations or locking where a simple load/store would suffice (e.g., in certain WAL and shared memory access patterns).

The root cause is that src/include/port/atomics/arch-x86.h is never included on MSVC builds because the detection only checks GCC-style macros. Thomas Munro confirmed this empirically: adding #error to arch-x86.h produces failures on MinGW CI but passes on Visual Studio CI — proving VS never hits that header.

Proposed Solutions

Approach 1: PG_ARCH_XXX Macros (Thomas Munro)

Define PostgreSQL-native macros in a central location:

/* In c.h or port/pg_cpu.h */
#if defined(__x86_64__) || defined(_M_AMD64)
#define PG_ARCH_X86 1
#define PG_ARCH_X86_64 1
#elif defined(__i386__) || defined(_M_IX86)
#define PG_ARCH_X86 1
#define PG_ARCH_X86_32 1
#elif defined(__aarch64__) || defined(_M_ARM64)
#define PG_ARCH_ARM 1
#define PG_ARCH_ARM_64 1
/* ... etc for LOONGARCH, MIPS, PPC, RISCV, S390, SPARC ... */
#endif

The naming convention provides both a generic family macro (PG_ARCH_X86) and specific width macros (PG_ARCH_X86_64), covering architectures: ARM, LoongArch, MIPS, PPC, RISC-V, S390, SPARC, x86.

Advantages: Clean, self-documenting, PostgreSQL-idiomatic, provides both family and bitwidth detection.

Disadvantages: More code churn; future patch submitters may not know to use PG_ARCH_XXX and will submit patches using __x86_64__ directly.

Approach 2: Standardize on GCC Spellings (Tom Lane)

Instead of inventing new macros, ensure GCC's __x86_64__, __aarch64__, etc. are always defined, even on non-GCC compilers:

/* In c.h */
#if defined(_M_AMD64) && !defined(__x86_64__)
#define __x86_64__
#endif
#if defined(_M_IX86) && !defined(__i386__)
#define __i386__
#endif
/* ... */
#else
#error "unrecognized CPU architecture"
#endif

Advantages: Minimal code churn — only need to remove the few alternative spellings scattered in the tree. Patch submitters are already familiar with GCC macros. Precedent exists (the old solaris.h approach did exactly this).

Disadvantages: Defines double-underscore macros that are technically in the implementation's reserved namespace. Though as Tom notes, there's no mechanism for compilers to prevent user code from defining __ macros, and PostgreSQL has done this before.

Key Design Decision: The #else #error Branch

Tom Lane's proposal includes an #else #error clause that fires if no architecture is detected. This is architecturally important: it turns silent misdetection (the current bug) into a hard compile error, ensuring any new platform or compiler must explicitly be handled.

The BF_ASM Tangent

Dagfinn Ilmari Mannsåker noticed that the architecture detection consolidation reveals dead code: BF_ASM in the Blowfish implementation (src/backend/libpq/crypt.c or similar) has never been defined to anything but 0 since 2001, and the assembly function it would invoke (_BF_body_r) has never existed in the PostgreSQL tree. This is a separate cleanup opportunity exposed by the refactoring.

Architectural Significance

This issue matters beyond the immediate Windows performance fix because:

  1. It's a class of bug, not a single bug — Any new architecture-specific optimization added in the future is at risk of the same silent failure if detection is ad-hoc.

  2. Growing architecture diversity — With LoongArch, RISC-V, and ARM server adoption growing, PostgreSQL needs robust detection for more architectures than the traditional x86/SPARC/PPC trio.

  3. Atomics are correctness-critical — While the current failure "merely" loses performance on Windows, incorrect atomics detection could theoretically lead to data corruption on weaker memory architectures (ARM, RISC-V, PPC) if barriers were accidentally omitted rather than accidentally over-applied.

  4. Related work — The timing_clock_source thread and John Naylor's pg_cpu.h work both need reliable architecture detection, making centralization timely.

Current Status

The thread reached a natural discussion point between the two approaches but no patch was committed as of the last message. Tom Lane's preference for GCC spellings represents a pragmatic minimal-churn approach, while Thomas Munro's PG_ARCH macros are more principled. The discussion is likely to continue, potentially converging on Tom Lane's approach given his committer authority and the practical argument about reducing future patch friction.