Analysis: Adding errdetail() with PID/UID of Signal Sender
Core Problem and Motivation
When a PostgreSQL backend is terminated via SIGTERM (e.g., from pg_terminate_backend(), pg_ctl stop, or an external kill), the user-visible message has historically been terse:
FATAL: terminating connection due to administrator command
This leaves DBAs with no forensic information about who sent the signal. In multi-tenant or shared environments, distinguishing between a legitimate pg_ctl stop, a rogue administrator, an OOM-killer, or a misbehaving monitoring script requires correlating timestamps across multiple logs. The proposal from Jakub Wartak aims to surface the sender's PID and UID directly in the error detail, leveraging POSIX SA_SIGINFO / siginfo_t to extract si_pid and si_uid inside the signal handler.
Architecturally this matters because signals are one of the few cross-process communication channels in PostgreSQL that previously carried no provenance into the logging subsystem. Adding this plumbing turns an opaque event into an auditable one.
Evolution of the Design — Three Successive Architectures
v1–v6: Direct globals set inside the wrapper signal handler
The early patches added HAVE_SA_SIGINFO autoconf/meson probes and extended wrapper_handler() in src/port/pqsignal.c to take the siginfo_t *info, void *context signature. The wrapper stashed info->si_pid and info->si_uid into file-scope globals (proc_die_sender_pid, proc_die_sender_uid, later ProcDieSenderPid/Uid), which die() / ProcessInterrupts() / SyncRepWaitForLSN() would then read and format into errdetail().
Review cycles refined several secondary concerns:
- Style guide compliance (capitalizing the first word of DETAIL, full sentences ending in a period, two spaces between sentences) per Evan Chao and Jim Jones, matching
doc/src/sgml/errors.sgml. - Atomicity of the globals — debate over
sig_atomic_tvsuint32.sig_atomic_tis typicallyint, which bounds the representable PID range but guarantees async-signal-safe single-copy writes. - Type portability — Windows lacks
pid_t/uid_tnatively (thoughwin32_port.htypedefs them asint). Settled onintafter oscillation betweenuint32,long long, andpid_t. - Stale-data hazard — Chao pointed out that if a backend receives SIGTERM but then dies for an unrelated reason, the globals would mislead. Fix: in
ProcessInterrupts(), copy to locals and zero the globals beforeereport. - Inappropriate sites — the query-cancel path in
SyncRepWaitForLSNis not SIGTERM-driven and shouldn't log sender info.
Andrew Dunstan committed v6 on 2026-04-07.
Andres's objection — layering violation
Shortly after commit, Andres Freund raised three serious objections:
- Translatability: conditional
errdetail()appended via theproc_die_sender_pid == 0 ? 0 : errdetail(...)trick produces a separate translatable string for every callsite. - Layering:
pqsignal.c'swrapper_handlerhas no business knowing that SIGTERM specifically means "process death" and thus no business settingProcDieSenderPid. Not every SIGTERM handler wants this (e.g., some auxiliary processes have custom handlers). - Signal nesting: On platforms where signal handlers can be interrupted by other handlers, globals set by the wrapper can be overwritten before the inner handler reads them — a classic async-signal race.
Andres's verdict: "nowhere near ready to have been committed." This is notable because Andres is a senior committer with deep expertise in PostgreSQL's signal/latch infrastructure (author of much of the latch and async notification code), so his architectural objections carried decisive weight.
The rewrite: extending SIGNAL_ARGS itself
Andres sketched the correct fix: rather than smuggle info through globals, change the signal handler ABI so every handler receives a platform-agnostic pg_signal_info * as a second argument. Key components:
typedef struct pg_signal_info {
pid_t pid; /* 0 if unknown */
uid_t uid; /* only meaningful when pid != 0 */
} pg_signal_info;
#define SIGNAL_ARGS int postgres_signal_arg, const pg_signal_info *pg_siginfo
The wrapper in pqsignal.c translates platform-specific siginfo_t (when SA_SIGINFO is available) into the postgres-neutral struct, or passes a zeroed one otherwise. Individual handlers like die() can then set ProcDieSenderPid/Uid from their argument — no globals shared across the handler boundary, no nesting hazard, and pqsignal.c stops knowing anything domain-specific.
This required an ecosystem-wide change: every pass of SIG_IGN / SIG_DFL to pqsignal() now produces a function-pointer type mismatch warning. The resolution (suggested by Chao) was to define:
#define PG_SIG_DFL (pqsigfunc) (pg_funcptr_t) SIG_DFL
#define PG_SIG_IGN (pqsigfunc) (pg_funcptr_t) SIG_IGN
The intermediate cast through pg_funcptr_t silences -Wcast-function-type-mismatch from Clang. Every callsite across the tree was updated from SIG_IGN/SIG_DFL to PG_SIG_IGN/PG_SIG_DFL.
Windows / mingw portability pain
Jakub did the heavy lifting on Windows. Several compounding issues:
c.hdefinesSIGNAL_ARGSbeforeport.his included, butpid_t/uid_tare typedef'd inwin32_port.h(chicken-and-egg). Workaround considered: useuint32in the struct, but this conflicts with Linux whereuid_tis__u32andpid_tis__s32. Final approach keptpid_t/uid_twith careful header ordering.src/backend/port/win32/signal.c'spgwin32_dispatch_queued_signals()invokes handlers assig(i)— now a signature mismatch. Fixed by passing a zero-initializedpg_signal_info.USE_SIGACTIONvsUSE_SIGINFOguards needed disentangling: the Windows sigaction emulation doesn't implementsa_sigaction, sosiginfo_treferences must be gated onUSE_SIGINFOspecifically, not merelyUSE_SIGACTION.- On native Windows
signal(), the handler must take a single int, sowrapper_handlermust be used with that signature on that path.
Post-commit fallout
OpenBSD: si_pid always zero for SIGTERM
The buildfarm's OpenBSD members failed the new TAP test. Tom Lane reproduced locally on OpenBSD 7.7: HAVE_SA_SIGINFO is defined, SA_SIGINFO is set, but info->si_pid and si_uid arrive as zero even for pg_terminate_backend() from a different session. Jakub confirmed this is intentional in the OpenBSD kernel — sys/kern/kern_sig.c:initsiginfo() zeros the struct and only waitid(2) / dowait*() populate si_pid. This matches OpenBSD's general posture of minimizing cross-process information leakage.
Resolution: make the regex in the TAP test match the DETAIL line as optional rather than required. No runtime code change — the feature simply degrades gracefully on platforms that don't populate the fields.
Information-disclosure concern
Evan Chao raised a late concern: exposing UID 501 (or any OS-level UID) to an ordinary unprivileged client is a potential information leak in shared hosting. Proposed fix: use errdetail_log() instead of errdetail(), which routes the detail to the server log only, hiding it from the client. Jakub concurred. This is the last unresolved thread item in the captured discussion.
Key Takeaways
- Signal handler ABIs are a commitment. The v6 commit attempted to retrofit new information through globals; the proper fix required changing
SIGNAL_ARGSitself and touching every signal registration in the tree. This is the kind of change that tends to get rejected later in review exactly as Andres did. siginfo_tportability is weaker than the feature-test macro suggests.HAVE_SA_SIGINFObeing defined says nothing about whether the kernel bothers to fillsi_pid/si_uid. OpenBSD's behavior is a reminder that POSIX is a floor, not a contract.- Function-pointer type-checking in modern Clang (
-Wcast-function-type-mismatch) is increasingly strict and interacts badly withSIG_IGN/SIG_DFLbeingvoid(*)(int)when you extend the handler signature. Thepg_funcptr_tdouble-cast pattern is the escape hatch. - The "fix for a feature committed before feature freeze" exception (invoked by Andrew when Kirill Reshke questioned the post-freeze rework commit) is a legitimate but easily abused mechanism; in this case the rework arguably went well beyond a fix and was probably only acceptable because the alternative was reverting entirely.