Review comments on createPartitions() in pgbench.c

First seen: 2026-05-15 04:15:42+00:00 · Messages: 1 · Participants: 1

Latest Update

2026-05-18 · claude-opus-4-6

Technical Analysis: Review Comments on createPartitions() in pgbench.c

Core Problem

This thread raises code robustness concerns in the createPartitions() function within pgbench.c — PostgreSQL's built-in benchmarking tool. The function is responsible for generating DDL to create partitioned tables (both RANGE and HASH partitioning) used in pgbench's TPC-B-like workload. While pgbench is not part of the database engine itself, it is a widely-used first-party tool, and correctness in partition boundary calculations affects benchmark validity.

The four observations center on defensive programming practices:

1. Integer Overflow in part_size Calculation

The most technically substantive concern. In pgbench, naccounts defaults to 100,000 and is multiplied by scale. When scale is large (e.g., 10,000+), the product naccounts * scale can exceed INT_MAX (2,147,483,647) if computed using 32-bit arithmetic. The existing code likely performs this multiplication using int types, which would silently overflow on platforms with 32-bit int. The proposed fix:

int64 total = (int64) naccounts * scale;
int64 part_size = (total + partitions - 1) / partitions;

This uses explicit int64 widening before the multiplication and employs the standard ceiling-division idiom (n + d - 1) / d to ensure partition boundaries cover the entire account space without gaps.

2. Assert(partitions > 0) for Runtime Safety

Assert() is compiled out in non-debug builds (--disable-cassert). If partitions could ever be zero (e.g., due to argument parsing bugs), this would cause a division-by-zero in production builds. The suggestion implies that a runtime check with an explicit error message would be more robust, though the reviewer acknowledges this may be guarded elsewhere in the call chain.

3. PQExpBuffer Reuse Without resetPQExpBuffer()

pgbench constructs SQL statements using PQExpBuffer. If the buffer is reused across loop iterations (e.g., generating multiple CREATE TABLE ... PARTITION OF statements), failing to call resetPQExpBuffer() before each iteration would cause SQL commands to accumulate/concatenate incorrectly. This is a common pattern bug in PostgreSQL client-side code.

4. Uneven RANGE Partition Boundaries

When the total account space is not evenly divisible by the number of partitions, the last partition's upper bound must cover the remainder. The ceiling-division fix addresses this, but there's an additional edge case: if the final partition's computed upper bound exceeds the actual maximum account ID, the partition will simply be empty for those values — benign but potentially confusing. Conversely, if bounds are computed with floor division, the last few accounts might not fall into any partition, causing INSERT failures during data loading.

Architectural Context

pgbench partitioning support was added to allow benchmarking PostgreSQL's partitioned table performance. The createPartitions() function generates:

Correctness here directly affects whether pgbench's --partitions option produces a valid, gap-free partition scheme at high scale factors.

Assessment

This is an initial review posting with no patches attached and no confirmed bugs. The observations are reasonable code-quality concerns. The overflow issue is the most actionable — at scale factors above ~21,474 (where 100000 * 21475 > INT_MAX), integer overflow would produce incorrect partition boundaries. While such scale factors are unusual, they are not impossible on large systems.

The thread has received no responses yet, which is typical for cold-start review observations from new contributors, especially when no patch or reproducible failure is provided.