Technical Analysis: Cirrus CI Shutdown and PostgreSQL CI Infrastructure Migration
The Core Problem
Cirrus CI, the continuous integration platform that PostgreSQL's community development infrastructure depends upon, announced shutdown effective June 1, 2026. This creates an urgent infrastructure crisis for two critical development workflows:
- cfbot — the automated system that runs CI on every patch submitted to the commitfest, providing green/red status indicators visible to reviewers and authors
- Personal repository CI — developers (especially committers) running CI on their own GitHub forks before pushing commits or submitting patches
The resource consumption numbers Andres provided give a sense of the scale: ~1,464 core-hours/day across all CI jobs, ~396 of which are Windows (expensive due to licensing), plus macOS on self-hosted runners. This is not a trivial workload to relocate.
Why This Matters Architecturally
PostgreSQL's cross-platform correctness guarantees are central to its value proposition. The project supports Linux, macOS, Windows, FreeBSD, NetBSD, and OpenBSD — each with distinct system call semantics, compiler behaviors, and filesystem characteristics. CI that covers these platforms catches:
- Endianness and alignment issues
- Platform-specific atomics/memory model bugs
- Windows-specific path handling, socket, and process model differences
- BSD-specific signal handling and syscall semantics
- Compiler-specific optimization bugs (MSVC vs GCC vs Clang)
The loss of multi-platform CI would represent a significant regression in development velocity and code quality. Features like Andres's Asynchronous I/O (AIO) work — which requires testing across io_uring, kqueue, and Windows IOCP — would be particularly impacted.
Proposed Solutions and Design Tradeoffs
Short-term: GitHub Actions Migration (Consensus Path)
The thread converges on GitHub Actions as the only viable short-term solution given the ~2 week window. Jelte Fennema-Nio produced a working GitHub Actions workflow (with AI assistance from Claude Code) that achieves green builds across all previously-supported platforms, including BSDs via the cross-platform-actions/action which uses nested QEMU virtualization.
Key technical challenges identified:
- io_uring disabled: GitHub Actions runners have it blocked at the host kernel level. Bilal found a workaround via
prlimit --memlock=unlimitedsuggesting the issue was memlock limits, not kernel support. - No pre-built images: Dependencies must be re-downloaded each run, increasing build times significantly. The existing
pg-vm-imagesinfrastructure uploads to GCP; needs adaptation. - Log visibility: GitHub Actions logs require authentication — a regression from Cirrus CI's public URLs, which were specifically chosen so mailing list discussions could link to build results without requiring GitHub accounts.
- ccache persistence: No cross-run cache state, meaning full rebuilds every time.
- cfbot integration: Not yet implemented; Thomas Munro's domain.
Long-term: Self-hosted Open Source CI
Multiple participants (Peter Eisentraut, Alexander Korotkov, Thomas Munro) advocate for eventually running self-hosted open source CI to achieve "capitalism-proof" infrastructure. Specific proposals include:
- Woodpecker CI (David Wheeler) — Go-based, Forgejo-integrated, has local mode
- QEMU-based universal image infrastructure (Thomas Munro) — Publishing standardized qemu images at
ci.postgresql.org/images/qemu/that work in multiple contexts: local development, public cloud VMs, GitHub Actions, and cfbot's own infrastructure - Sponsored cloud + open source CI (Alexander Korotkov) — Self-host open source CI software on cheap cloud with sponsorship
Thomas Munro's QEMU image proposal is the most architecturally comprehensive: it decouples the "what to test" (images) from "where to test" (CI platform), making the project resilient to any single provider's shutdown. The same images would serve local development, personal CI, and cfbot.
Resource Optimization
Robert Haas raises an important efficiency concern: cfbot runs 14 complete CI cycles on a 6-line patch with 4 thread messages. This suggests heuristics could reduce load:
- Tom Lane's position: Run CI promptly on new submissions (important for first-time contributors), but reduce periodic bit-rot re-testing
- Michael Paquier: Reduce frequency for patches under 20-50 lines
- Euler Taveira: Don't restrict by patch size (small patches can break things), but allow manual triggering instead of automatic re-runs
Key Design Decisions and Disagreements
Self-hosted vs. Proprietary
| Position | Advocates | Argument |
|---|---|---|
| Proprietary (GH Actions) is fine | Jelte Fennema-Nio | Self-hosted can be abandoned too; GitHub will outlive underfunded OSS CI |
| Self-hosted is critical | Bruce Momjian, Peter Eisentraut | Proprietary becomes expensive/obsolete; GitHub already tried per-minute fees for self-hosted runners |
| Both (layered approach) | Thomas Munro | Build capitalism-proof base layer, use commercial services as convenience layer |
BSD Platform Support
| Position | Advocates | Argument |
|---|---|---|
| BSDs less important | Jelte | Signal-to-flakiness ratio too low; BSDs rarely catch issues Linux+macOS miss |
| BSDs important | Bilal Yavuz | OpenBSD catches unique issues; flakiness is due to building images from scratch, not inherent |
Who Should Pay
- Heikki Linnaka: Active contributors can pay for their own CI or self-host; free credits not important
- Peter Eisentraut: Priority should be lowering barriers for new/occasional contributors
- Jelte: Committers should be able to grant limited CI hours to new contributors
The Emergency Timeline
The thread spans April 9 to May 18, 2026 — but real action only happens in the final two weeks. Jelte's May 18 message is essentially a fire alarm: "In less than two weeks we won't have a working CI anymore." The patch he produces is explicitly described as AI-generated with cursory review, reflecting the urgency. Bilal's immediate response offering to take over and merge his own parallel work suggests the community recognized the deadline pressure.
Technical Details of the GitHub Actions Implementation
From what's described, the workflow:
- Uses Docker containers for Linux (leveraging existing pre-built images)
- Uses
cross-platform-actions/actionfor FreeBSD/NetBSD/OpenBSD (QEMU nested virtualization) - Runs native for Windows and macOS
- Uses 4-core runners (matching Cirrus configuration)
- Lacks: ccache, artifact archival, public log URLs, cfbot integration
Bilal's alternative approach introduces helper scripts (install-deps.sh, configure.sh, build.sh, test.sh) that abstract CI-provider-specific details — a design that supports future migration to yet another platform.
Unresolved Issues
- Public log access: No solution identified; may require a separate log hosting service
- macOS runners: Self-hosted Mac fleet management unclear under new regime
- cfbot integration: Thomas Munro's domain, not addressed in patches yet
- Image pipeline: pg-vm-images currently targets GCP; needs QEMU/container output paths
- Cost model: Daily 1,464 core-hours on GitHub Actions would be expensive without donated/sponsored runners