Broken build on macOS (Universal / Intel): cpuid instruction not available

First seen: 2026-05-07 11:41:11+00:00 · Messages: 12 · Participants: 7

Latest Update

2026-05-08 · opus 4.7

Broken macOS Universal / Intel Builds After x86 CPU Feature Centralization

Context and Architectural Background

In the PostgreSQL 19 development cycle, two related pieces of work landed that consolidated and expanded use of x86-specific CPU feature detection:

Prior to 16743db, the tree used defensive guards such as:

#ifdef HAVE_X86_64_POPCNTQ
#if defined(HAVE__GET_CPUID) || defined(HAVE__CPUID)
#define TRY_POPCNT_X86_64 1
#endif
#endif

which meant cpuid-dependent code paths were only compiled when configure had actually detected a working __get_cpuid/__cpuid intrinsic. The centralized replacement dropped that guard, implicitly assuming that any x86 build target provides one of those intrinsics.

The Two Reported Failures

1. macOS Universal Builds (arm64 + x86_64, fat binaries)

Triggered by CFLAGS="-arch arm64 -arch x86_64". The build fails with "cpuid instruction not available" coming out of pg_cpu_x86.c.

Tom Lane's diagnosis is the definitive one: configure runs its feature probes exactly once, producing a single pg_config.h. With two target architectures passed simultaneously to the compiler driver, the probe either:

Depending on which arch the probe was actually exercised against, macros like HAVE__GET_CPUID / HAVE__CPUID end up either defined or undefined for both arches. In the "undefined for both" case, the x86 half of the fat binary can no longer compile the centralized cpuid code — hence the error. In the "defined for both" case, the arm64 half would try to include <cpuid.h> and fail.

Tom's deeper point: universal builds only appeared to work historically by accident. What really happened is that with the old guards, the x86 half silently lost its SIMD/cpuid optimizations whenever the probe happened to fail under multi-arch flags. The binary compiled, but at the cost of running an un-optimized x86 slice. The new centralized code simply exposes the latent breakage by refusing to compile rather than silently degrading.

A proper fix would require two independent configure probe passes, one per target arch, and an #ifdef __x86_64__ / #ifdef __aarch64__ gated set of macros applied per translation unit. That is non-trivial autoconf/meson work, and Tom explicitly declines it, noting Apple's trajectory away from x86.

2. Intel-only Build under Rosetta with Xcode 26.2

checksum.c:57 fails with call to undeclared function 'x86_feature_available'. Jakob later narrowed this to a specific toolchain combination (Xcode 26.2 invoked via Rosetta) — switching to Command Line Tools 26.4 resolved it. This suggests a header-visibility or SDK quirk in that particular Xcode, not a generic Intel regression. Buildfarm animal longfin (Intel Mac mini, one macOS rev behind) and Daniel Gustafsson's Intel MBP both build cleanly, corroborating that native Intel is not broken in general.

Design Discussion and Proposed Resolutions

Three positions emerge:

  1. Disclaim multi-arch support entirely. Tom's default position: universal builds never really worked correctly, Apple is deprecating x86, and proper fat-binary support requires per-arch configure runs which nobody wants to build or maintain. Nathan Bossart concurs ("+1 ... can't get excited about it").

  2. Expand buildfarm coverage. Tobias Bussmann offers to host macOS VMs covering universal builds and cross-compilation against various Apple toolchains. Tom endorses this strongly — in PostgreSQL's culture, if you want an edge case to keep working, the cost of admission is a buildfarm animal. Without one, regressions are inevitable and will not be treated as release blockers.

  3. Make pg_cpuid() fail soft, independent of multi-arch. Tom's follow-up is the technically substantive proposal: regardless of whether universal builds are supported, the assumption "every x86 platform provides HAVE__GET_CPUID or HAVE__CPUID" is wrong on its face. pg_cpuid() should return zeroes when it has no way to issue the instruction (analogous to how pg_cpuid_subleaf() already behaves). Callers would then naturally take the fallback path.

    Lukas Fittl (who worked on the TSC-based timing code this cycle) confirms this is viable for the TSC path: if CPUID data is unavailable, the code already falls back to the system clock and disables TSC use. He defers to John Naylor on whether other consumers (popcount, AVX2 checksum, etc.) can tolerate an "all zeroes" feature vector — which they functionally should, since zeroes mean "no features advertised" and every consumer must already have a scalar fallback.

This third proposal is the likely actual fix. It is small, localized to pg_cpu_x86.c, and restores the "silently degrade" semantics that existed implicitly before centralization — but now as an explicit, documented API contract rather than an accident of #ifdef guards.

Key Technical Insight: Centralization Changed the Failure Mode

The move from per-feature #ifdef guards to a centralized x86_feature_available() API is architecturally correct — it eliminates duplication and makes adding new feature-gated paths (like AVX2 checksums and TSC) trivial. But the refactor changed the failure mode from "silently compile without optimization" to "hard compile error" when cpuid intrinsics are unavailable on an x86 target. For mainstream x86 builds this is fine (intrinsics are always present), but it removed the escape hatch that had been accidentally protecting universal builds and unusual toolchain configurations.

The fix is to reintroduce that escape hatch at the API boundary rather than at every call site: pg_cpuid() returning zero when it cannot execute is a clean way to say "no features detected" and forces every consumer to have a scalar fallback path — which is already required for runtime dispatch on non-x86 architectures anyway.

Who Carries Weight Here

Likely Outcome