2026-05-08 · opus 4.7

Broken macOS Universal / Intel Builds After x86 CPU Feature Centralization

Context and Architectural Background

In the PostgreSQL 19 development cycle, two related pieces of work landed that consolidated and expanded use of x86-specific CPU feature detection:

Commit 16743db — "Centralize detection of x86 CPU features." This introduced a unified pg_cpu_x86.c module exposing x86_feature_available() (and a pg_cpuid()/pg_cpuid_subleaf() helper), replacing ad-hoc per-feature probes scattered across popcount, checksum, and (newly) TSC-related code paths.
Commit 5e13b0f — "Use AVX2 for calculating page checksums where available." This new consumer in checksum.c calls x86_feature_available() directly, assuming the centralized API is always compilable on x86 targets.

Prior to 16743db, the tree used defensive guards such as:

#ifdef HAVE_X86_64_POPCNTQ
#if defined(HAVE__GET_CPUID) || defined(HAVE__CPUID)
#define TRY_POPCNT_X86_64 1
#endif
#endif

which meant cpuid-dependent code paths were only compiled when configure had actually detected a working __get_cpuid/__cpuid intrinsic. The centralized replacement dropped that guard, implicitly assuming that any x86 build target provides one of those intrinsics.

The Two Reported Failures

1. macOS Universal Builds (arm64 + x86_64, fat binaries)

Triggered by CFLAGS="-arch arm64 -arch x86_64". The build fails with "cpuid instruction not available" coming out of pg_cpu_x86.c.

Tom Lane's diagnosis is the definitive one: configure runs its feature probes exactly once, producing a single pg_config.h. With two target architectures passed simultaneously to the compiler driver, the probe either:

succeeds for x86_64 and fails for arm64 (because the arm64 compilation of a test program including <cpuid.h> hits #error this header is for x86 only), or
the reverse.

Depending on which arch the probe was actually exercised against, macros like HAVE__GET_CPUID / HAVE__CPUID end up either defined or undefined for both arches. In the "undefined for both" case, the x86 half of the fat binary can no longer compile the centralized cpuid code — hence the error. In the "defined for both" case, the arm64 half would try to include <cpuid.h> and fail.

Tom's deeper point: universal builds only appeared to work historically by accident. What really happened is that with the old guards, the x86 half silently lost its SIMD/cpuid optimizations whenever the probe happened to fail under multi-arch flags. The binary compiled, but at the cost of running an un-optimized x86 slice. The new centralized code simply exposes the latent breakage by refusing to compile rather than silently degrading.

A proper fix would require two independent configure probe passes, one per target arch, and an #ifdef __x86_64__ / #ifdef __aarch64__ gated set of macros applied per translation unit. That is non-trivial autoconf/meson work, and Tom explicitly declines it, noting Apple's trajectory away from x86.

2. Intel-only Build under Rosetta with Xcode 26.2

checksum.c:57 fails with call to undeclared function 'x86_feature_available'. Jakob later narrowed this to a specific toolchain combination (Xcode 26.2 invoked via Rosetta) — switching to Command Line Tools 26.4 resolved it. This suggests a header-visibility or SDK quirk in that particular Xcode, not a generic Intel regression. Buildfarm animal longfin (Intel Mac mini, one macOS rev behind) and Daniel Gustafsson's Intel MBP both build cleanly, corroborating that native Intel is not broken in general.

Design Discussion and Proposed Resolutions

Three positions emerge:

Disclaim multi-arch support entirely. Tom's default position: universal builds never really worked correctly, Apple is deprecating x86, and proper fat-binary support requires per-arch configure runs which nobody wants to build or maintain. Nathan Bossart concurs ("+1 ... can't get excited about it").
Expand buildfarm coverage. Tobias Bussmann offers to host macOS VMs covering universal builds and cross-compilation against various Apple toolchains. Tom endorses this strongly — in PostgreSQL's culture, if you want an edge case to keep working, the cost of admission is a buildfarm animal. Without one, regressions are inevitable and will not be treated as release blockers.
Make pg_cpuid() fail soft, independent of multi-arch. Tom's follow-up is the technically substantive proposal: regardless of whether universal builds are supported, the assumption "every x86 platform provides HAVE__GET_CPUID or HAVE__CPUID" is wrong on its face. pg_cpuid() should return zeroes when it has no way to issue the instruction (analogous to how pg_cpuid_subleaf() already behaves). Callers would then naturally take the fallback path.

Lukas Fittl (who worked on the TSC-based timing code this cycle) confirms this is viable for the TSC path: if CPUID data is unavailable, the code already falls back to the system clock and disables TSC use. He defers to John Naylor on whether other consumers (popcount, AVX2 checksum, etc.) can tolerate an "all zeroes" feature vector — which they functionally should, since zeroes mean "no features advertised" and every consumer must already have a scalar fallback.

This third proposal is the likely actual fix. It is small, localized to pg_cpu_x86.c, and restores the "silently degrade" semantics that existed implicitly before centralization — but now as an explicit, documented API contract rather than an accident of #ifdef guards.

Key Technical Insight: Centralization Changed the Failure Mode

The move from per-feature #ifdef guards to a centralized x86_feature_available() API is architecturally correct — it eliminates duplication and makes adding new feature-gated paths (like AVX2 checksums and TSC) trivial. But the refactor changed the failure mode from "silently compile without optimization" to "hard compile error" when cpuid intrinsics are unavailable on an x86 target. For mainstream x86 builds this is fine (intrinsics are always present), but it removed the escape hatch that had been accidentally protecting universal builds and unusual toolchain configurations.

The fix is to reintroduce that escape hatch at the API boundary rather than at every call site: pg_cpuid() returning zero when it cannot execute is a clean way to say "no features detected" and forces every consumer to have a scalar fallback path — which is already required for runtime dispatch on non-x86 architectures anyway.

Who Carries Weight Here

Tom Lane drives the diagnosis and sets policy. His reading of the configure machinery and his "disclaim multi-arch unless someone steps up with a buildfarm animal" stance is effectively the project position.
Nathan Bossart (committer, owner of much of the SIMD/popcount infrastructure) aligns with Tom.
John Naylor authored/reviewed much of the centralization work and is the authority on what consumers can tolerate zero-filled cpuid results.
Lukas Fittl represents the TSC-timing consumer and confirms the fail-soft semantics work for his code.
Jakob Egger / Tobias Bussmann are the reporters; Tobias's offer to host buildfarm VMs is the pragmatic path to keeping universal builds supported long-term.

Likely Outcome

A small patch to pg_cpu_x86.c making pg_cpuid() return zeros when neither HAVE__GET_CPUID nor HAVE__CPUID is defined, mirroring pg_cpuid_subleaf().
No investment in per-arch configure machinery.
Universal macOS builds will compile and run correctly, but the x86 slice will not get AVX2 / popcount / TSC optimizations — which matches the de facto pre-16743db behavior.
A macOS buildfarm animal covering multi-arch builds, if Tobias follows through, would be the long-term safety net.

Broken build on macOS (Universal / Intel): cpuid instruction not available

Latest Update