Tighten SCRAM Iteration Parsing and Bound libpq PBKDF2 Work
Core Problem
This thread addresses two distinct but related security/correctness issues in PostgreSQL's SCRAM-SHA-256 authentication implementation:
1. Integer Overflow in Iteration Count Parsing (Bug Fix)
Both the backend's SCRAM verifier parser (parse_scram_secret()) and libpq's server-first-message parser use strtol() to parse the iteration count into a long, then store the result in an int. On platforms where long is 64-bit (most modern Unix systems), values in the range (INT_MAX, LONG_MAX] pass the strtol() check but are silently truncated when narrowed to int. While this likely doesn't enable invalid logins (the verifier would compute PBKDF2 with the truncated count, producing a mismatch), it represents undefined behavior territory and incorrect parsing semantics.
Additionally, the existing parsing accepts leading whitespace, sign characters, hex prefixes (0x), and other strtol() syntax that has no place in a SCRAM iteration count field per RFC 5802.
2. Client-Side Denial-of-Service via Excessive PBKDF2 Iterations
A malicious or misconfigured server can advertise an arbitrarily large SCRAM iteration count in its server-first-message. The client (libpq) then enters the PBKDF2 key derivation loop, which is computationally expensive and runs synchronously. The backend has CHECK_FOR_INTERRUPTS() inside its PBKDF2 loop, but the frontend had no equivalent mechanism. This creates a denial-of-service vector where:
- A
connect_timeoutof 1 second results in 12+ seconds of actual wall-clock time as the client grinds through millions of PBKDF2 iterations before checking the deadline. - Async
PQconnectPoll()callers have no protection at all since they never enter the blocking deadline machinery. - Clients without any timeout configured can be held indefinitely.
This is architecturally significant because it violates the contract of connect_timeout — a parameter specifically designed to bound connection attempt duration — and exposes all libpq consumers (psql, application servers, connection poolers) to a trivial resource exhaustion attack from a rogue server.
Proposed Solution: Six-Patch Series
Patch 0001: scram_parse_iterations() — Strict Parsing
Introduces a new common helper function that:
- Validates the input contains only decimal digits (no whitespace, signs, hex, etc.)
- Rejects zero (invalid per RFC 5802 — iterations must be positive)
- Rejects values exceeding
INT_MAX - Returns a proper
intdirectly, eliminating the long→int narrowing issue
This replaces ad-hoc strtol() calls in both the backend verifier parser and libpq's SCRAM exchange. The function lives in src/common since both frontend and backend need it.
Patch 0002: test_scram Test Module
A new TAP test module under src/test/modules/test_scram that directly exercises scram_parse_iterations() with edge cases (overflow, leading zeros, signs, empty strings, etc.). Kept separate from 0001 to allow the bug fix to be backpatched independently without introducing new test infrastructure to stable branches.
Patch 0003: Mirror Blocking Connection Deadline into PGconn
Stores the computed connection deadline (from connect_timeout) as a field on PGconn. This makes the deadline accessible to authentication subsystems without threading it through every function signature. The design choice to put this on PGconn rather than fe_scram_state is deliberate — it makes the deadline available to future auth mechanisms (GSSAPI, LDAP SASL, etc.).
Patch 0004: Interrupt Callback for PBKDF2
Adds an optional interrupt callback parameter to the common SCRAM PBKDF2 helper. Rather than modifying scram_SaltedPassword() directly (which would break out-of-tree consumers of src/common), introduces a scram_SaltedPasswordExt() shim that accepts the callback. The callback is invoked periodically during the PBKDF2 iteration loop.
The design choice of a callback (rather than passing a deadline directly) keeps src/common SCRAM code independent of libpq's timeout representation. The same mechanism could support other abort conditions like user-initiated cancellation.
Patch 0005: Wire Up libpq Interrupt
Connects the pieces: libpq's SCRAM exchange code passes a callback that checks whether the PGconn connection deadline has expired. If so, PBKDF2 aborts early and returns an appropriate error. Includes a TAP test that doctors a SCRAM verifier to advertise a huge iteration count and verifies connect_timeout actually interrupts the computation.
Patch 0006: scram_max_iterations Connection Parameter
Adds a new libpq connection parameter (scram_max_iterations) with a corresponding PGSCRAMMAXITERATIONS environment variable. This is a hard client-side cap applied before PBKDF2 begins — if the server advertises a count above the cap, the connection is immediately rejected.
Key design decisions:
- Default cap of 100,000 iterations (25× PostgreSQL's default of 4,096)
- Value of 0 disables the cap entirely
- Protects async
PQconnectPoll()callers who don't benefit fromconnect_timeout - Mirrors pgjdbc's existing implementation (same 100K default)
Key Design Tradeoffs and Open Questions
Default Value for scram_max_iterations
The 100K default is a behavior change for deployments using very high iteration counts (250K–1M). The author notes these exist in practice. Options:
- 100K (proposed): Protects by default, may break exotic configurations
- 0 (disabled): Preserves compatibility, requires opt-in
- Higher value: Compromise between protection and compatibility
scram_SaltedPasswordExt() Shim vs. Direct Signature Change
The shim avoids breaking out-of-tree consumers of src/common SCRAM APIs (extensions, custom auth modules). However, it adds API surface area. A direct signature change with default NULL callback would be cleaner but forces recompilation of all consumers.
Deadline Location: PGconn vs. fe_scram_state
Storing on PGconn makes it available to all auth mechanisms. Storing on fe_scram_state would be more encapsulated but would need to be replicated for each auth method.
Backpatch Strategy
- 0001 (parsing fix): Candidate for backpatch; the int-truncation fix is safe, but rejecting zero/negative stored verifiers is a subtle behavior change
- 0003-0005 (connect_timeout enforcement): Candidate for backpatch as it fixes a security-adjacent issue
- 0006 (new parameter): HEAD-only as it introduces new user-visible API
TAP Test Timing Sensitivity
The test in 0005 relies on a 1-second connect_timeout actually expiring before PBKDF2 completes. The iteration count is set high enough that no realistic hardware could finish, but this is inherently timing-sensitive — a concern for CI environments with variable load.
Architectural Implications
This patch series touches several architectural boundaries:
-
src/common shared code: The iteration parser and PBKDF2 helper serve both frontend and backend, requiring careful API design that works in both contexts.
-
libpq connection state machine: Adding deadline awareness to the SCRAM exchange means the auth protocol handlers now participate in timeout enforcement, previously handled only at the socket I/O layer.
-
Client-server trust model: The
scram_max_iterationsparameter represents a philosophical stance that the client should not blindly trust server-provided SCRAM parameters. This aligns with defense-in-depth but is a departure from the traditional model where the server dictates authentication requirements. -
Precedent for other drivers: The explicit reference to pgjdbc's implementation suggests coordination across the PostgreSQL driver ecosystem, which strengthens the case for the 100K default.