off-by-one in pg_repack index loop

First seen: 2026-04-14 07:34:54+00:00 · Messages: 4 · Participants: 4

Latest Update

2026-05-06 · opus 4.7

Core Problem

This thread identifies a classic off-by-one boundary check bug in code that validates an index against a list length before calling list_nth_oid(). The thread title mentions "pg_repack" but the actual code in question is inside PostgreSQL proper — it uses foreach_current_index(), a macro introduced with the modern foreach_* iteration primitives (list.c/pg_list.h) that yields the 0-based position of the current element during list iteration.

The buggy pattern was:

int pos = foreach_current_index(ind_old);
if (unlikely(list_length(ind_oids_new) < pos))
    elog(ERROR, "list of new indexes too short");
ident_idx_new = list_nth_oid(ind_oids_new, pos);

Why this is wrong

Given a list of length N, valid 0-based indices are 0 .. N-1. The guard list_length(new) < pos only fires when pos > N, meaning pos == N would slip through — and list_nth_oid(list, N) on a length-N list is an out-of-bounds access that will either assert-fail in list_nth_cell() or return garbage/crash in a non-assert build.

The correct guard is list_length(ind_oids_new) <= pos, i.e. any pos >= N must raise the error before the list_nth_oid call. This is the fix Lakshmi proposed.

Architectural significance (small, but real)

While the defect is one character (<<=), the class of bug matters because:

  1. elog(ERROR, ...) guards of this kind exist specifically as "can't happen" defensive checks. Their whole purpose is to convert a latent out-of-bounds read into a clean ereport with a stable error message before the dereference. An off-by-one in such a guard defeats its entire reason for existence — the unsafe access still happens in exactly the edge case the guard was meant to catch.
  2. foreach_current_index() is a relatively new idiom (added alongside the rewritten List implementation), and callers coming from the pre-List-rewrite world sometimes reason about indices as if they were 1-based (mirroring older SQL-level conventions). This patch is a data point showing that even committers get the boundary wrong when mixing list_length() (a count) with foreach_current_index() (a 0-based offset).

Secondary Cleanup: removing unlikely()

John Naylor's response invokes commit 913ec71d6, which established a project-wide convention that unlikely() should not be sprinkled on cold error paths where the compiler and branch predictor will already do the right thing. elog(ERROR, ...) is a pg_attribute_noreturn path; the compiler already treats it as cold, so the unlikely() adds no optimization value and just clutters the source. Naylor's comment — "it's just a noise word here" — reflects an established stylistic ruling rather than a new debate. The final pushed version therefore both fixes the comparison and drops unlikely():

if (list_length(ind_oids_new) <= pos)
    elog(ERROR, "list of new indexes too short");

Review Dynamics and Authority

There was no disagreement — the thread is a textbook example of a tiny, uncontroversial correctness fix getting a small scope expansion from a committer to align with project conventions before commit.

Implications