Centralised Architecture Detection in PostgreSQL
The Core Problem
PostgreSQL's codebase relies on compiler-defined preprocessor macros to detect CPU architecture and conditionally compile architecture-specific optimizations. However, different compilers define different macros for the same architecture:
- GCC/MinGW:
__x86_64__,__aarch64__,__powerpc64__, etc. - MSVC (Visual Studio):
_M_AMD64,_M_IX86,_M_ARM64, etc. - Sun Studio:
__i386,__amd64,__sparc(no trailing underscores)
The detection logic is scattered across multiple files, and critically, several code paths fail to detect x86 on MSVC/Visual Studio builds. This has concrete, measurable performance consequences:
Real Performance Impact on Windows/MSVC Builds
-
pg_{read,write}_barrier()falls back topg_memory_barrier()— On x86, read and write barriers are no-ops (due to x86's strong memory ordering: stores are ordered w.r.t. stores, loads w.r.t. loads). Falling back to a full memory barrier introduces unnecessarymfenceorlockinstructions in hot paths involving spinlocks and lightweight locks. -
pg_spin_delay()compiles to nothing — On x86, this should emit aPAUSEinstruction, which hints to the CPU that a spin-wait loop is executing, improving performance on hyperthreaded cores by reducing pipeline flush penalties and yielding execution resources to the sibling hyperthread. -
PG_HAVE_8BYTE_SINGLE_COPY_ATOMICITYis undefined — On x86-64, aligned 8-byte reads and writes are guaranteed atomic by the architecture. Without this macro, PostgreSQL must use more expensive atomic operations or locking where a simple load/store would suffice (e.g., in certain WAL and shared memory access patterns).
The root cause is that src/include/port/atomics/arch-x86.h is never included on MSVC builds because the detection only checks GCC-style macros. Thomas Munro confirmed this empirically: adding #error to arch-x86.h produces failures on MinGW CI but passes on Visual Studio CI — proving VS never hits that header.
Proposed Solutions
Approach 1: PG_ARCH_XXX Macros (Thomas Munro)
Define PostgreSQL-native macros in a central location:
/* In c.h or port/pg_cpu.h */
#if defined(__x86_64__) || defined(_M_AMD64)
#define PG_ARCH_X86 1
#define PG_ARCH_X86_64 1
#elif defined(__i386__) || defined(_M_IX86)
#define PG_ARCH_X86 1
#define PG_ARCH_X86_32 1
#elif defined(__aarch64__) || defined(_M_ARM64)
#define PG_ARCH_ARM 1
#define PG_ARCH_ARM_64 1
/* ... etc for LOONGARCH, MIPS, PPC, RISCV, S390, SPARC ... */
#endif
The naming convention provides both a generic family macro (PG_ARCH_X86) and specific width macros (PG_ARCH_X86_64), covering architectures: ARM, LoongArch, MIPS, PPC, RISC-V, S390, SPARC, x86.
Advantages: Clean, self-documenting, PostgreSQL-idiomatic, provides both family and bitwidth detection.
Disadvantages: More code churn; future patch submitters may not know to use PG_ARCH_XXX and will submit patches using __x86_64__ directly.
Approach 2: Standardize on GCC Spellings (Tom Lane)
Instead of inventing new macros, ensure GCC's __x86_64__, __aarch64__, etc. are always defined, even on non-GCC compilers:
/* In c.h */
#if defined(_M_AMD64) && !defined(__x86_64__)
#define __x86_64__
#endif
#if defined(_M_IX86) && !defined(__i386__)
#define __i386__
#endif
/* ... */
#else
#error "unrecognized CPU architecture"
#endif
Advantages: Minimal code churn — only need to remove the few alternative spellings scattered in the tree. Patch submitters are already familiar with GCC macros. Precedent exists (the old solaris.h approach did exactly this).
Disadvantages: Defines double-underscore macros that are technically in the implementation's reserved namespace. Though as Tom notes, there's no mechanism for compilers to prevent user code from defining __ macros, and PostgreSQL has done this before.
Key Design Decision: The #else #error Branch
Tom Lane's proposal includes an #else #error clause that fires if no architecture is detected. This is architecturally important: it turns silent misdetection (the current bug) into a hard compile error, ensuring any new platform or compiler must explicitly be handled.
The BF_ASM Tangent
Dagfinn Ilmari Mannsåker noticed that the architecture detection consolidation reveals dead code: BF_ASM in the Blowfish implementation (src/backend/libpq/crypt.c or similar) has never been defined to anything but 0 since 2001, and the assembly function it would invoke (_BF_body_r) has never existed in the PostgreSQL tree. This is a separate cleanup opportunity exposed by the refactoring.
Architectural Significance
This issue matters beyond the immediate Windows performance fix because:
-
It's a class of bug, not a single bug — Any new architecture-specific optimization added in the future is at risk of the same silent failure if detection is ad-hoc.
-
Growing architecture diversity — With LoongArch, RISC-V, and ARM server adoption growing, PostgreSQL needs robust detection for more architectures than the traditional x86/SPARC/PPC trio.
-
Atomics are correctness-critical — While the current failure "merely" loses performance on Windows, incorrect atomics detection could theoretically lead to data corruption on weaker memory architectures (ARM, RISC-V, PPC) if barriers were accidentally omitted rather than accidentally over-applied.
-
Related work — The
timing_clock_sourcethread and John Naylor'spg_cpu.hwork both need reliable architecture detection, making centralization timely.
Current Status
The thread reached a natural discussion point between the two approaches but no patch was committed as of the last message. Tom Lane's preference for GCC spellings represents a pragmatic minimal-churn approach, while Thomas Munro's PG_ARCH macros are more principled. The discussion is likely to continue, potentially converging on Tom Lane's approach given his committer authority and the practical argument about reducing future patch friction.