Available disk space per tablespace

First seen: 2025-03-13 18:10:16+00:00 · Messages: 12 · Participants: 6

Latest Update

2026-05-07 · opus 4.7

New review from Zsolt Parragi (Percona) — v5 code review

A new reviewer, Zsolt Parragi (Percona), posted a detailed code-level review of v5. This is the first substantive review feedback from someone other than Munro and introduces several concrete defects, including one genuine correctness bug that the prior analysis/review rounds missed.

1. Integer overflow in the return expression (real bug)

The most significant finding. The code:

return fst.f_bavail * fst.f_frsize;  /* available blocks times fragment size */

Parragi points out (with a Godbolt link demonstrating the generated assembly) that the multiplication happens in the type of the operands before the implicit conversion to the 64-bit return value. On platforms where f_bavail and f_frsize are 32-bit unsigned long (notably 32-bit Linux, where POSIX only requires fsblkcnt_t/unsigned long width), this silently truncates for any filesystem larger than ~4 GB × fragment size. The fix is a cast on at least one operand, e.g. (uint64) fst.f_bavail * fst.f_frsize.

This matters because the whole point of the feature is multi-TB tablespaces — exactly the regime where 32-bit overflow bites. Neither Berg nor Munro flagged this across five patch revisions.

2. Windows error-handling inconsistency

The #ifdef WIN32 branch does elog(ERROR, "GetDiskFreeSpaceEx failed: error code %lu", GetLastError()), while the POSIX branch uses ereport with errcode_for_file_access() and %m. Parragi asks for symmetry:

3. Error message wording

"could not statvfs directory \"%s\"" leaks the syscall name into a user-facing message. Parragi suggests "could not get free disk space for directory \"%s\"". This matches PostgreSQL's general style — user-facing messages describe the intent, not the syscall (compare "could not stat file" vs. "could not fsync file", where the latter is arguably a style bug of its own, but statvfs is considerably more obscure).

4. Documentation wording

"Returns the available disk space in the tablespace" is imprecise — a tablespace is a logical PG object, not a storage pool with its own free space. The reported number is the filesystem's free space, which may be shared with other tablespaces, pg_wal, the OS, etc. Suggested rewording: "returns the space on the filesystem hosting the tablespace." This also ties back to the unresolved pg_wal design question from the last round — the docs need to be honest that the number is per-filesystem, not per-tablespace.

5. Typo in the psql query

wHERE (lowercase w) in the v5 \db+ query. Minor, but indicates the v5 psql change was not compile-tested against a non-superuser path, or was hand-edited after testing.

Assessment

Parragi's review is the first from a reviewer who read the code rather than the design. The overflow bug is a must-fix before commit and somewhat undermines the earlier "go for it" trajectory — v6 is required. The Windows/POSIX error-path asymmetry is a reasonable portability concern that the previous review cycle glossed over (Munro's error-handling pushback was all about POSIX). No position shifts yet; Berg has not responded.

History (1 prior analysis)
2026-05-06 · opus 4.7

Available disk space per tablespace — Technical Analysis

Core Problem

PostgreSQL exposes pg_tablespace_size() (how much a tablespace currently consumes) but has no first-class way to answer the operationally critical question: how much more data can I load? Users today must either shell out to df, run an external monitoring agent, or write a COPY PROGRAM hack. For cloud/managed PostgreSQL deployments where OS access is frequently unavailable, this is a genuine UX gap — the information is trivially available to the postmaster (which already has the tablespace directory open) but simply isn't surfaced through SQL.

This is a revival of a 5-year-old patch (originally posted 2019-11-08). Christoph Berg (Debian PostgreSQL maintainer, long-time contributor) narrows the 2019 design: rather than returning both total and available, the new pg_tablespace_avail(name|oid) returns a single scalar — the number of bytes available to unprivileged users (f_bavail on Unix, the user-quota-aware value from GetDiskFreeSpaceEx on Windows). "Total" is intentionally dropped as uninteresting; "used by PG" is already covered by pg_tablespace_size.

The statvfs vs statfs Question

The central portability question is which syscall family to use. The patch settles on POSIX statvfs(2) everywhere non-Windows. Two wrinkles surfaced:

  1. macOS field semantics (caught by Quan Zongliang). The original v1/v2 multiplied f_bavail * f_bsize. On macOS f_bsize is the preferred I/O size (often much larger than the real allocation unit), while f_frsize is the true fragment / allocation unit. On Linux the two are typically equal so the bug was invisible; on macOS it produced absurd values (23 TB reported on a 1 TB disk). Fix: use f_bavail * f_frsize. The Linux statvfs(3) man page does document f_frsize as the correct unit but less prominently. This is a classic cross-platform trap in filesystem stat code.

  2. The FreeBSD man page disclaimer. FreeBSD's statvfs(2) page literally says the structure is filled "with garbage ... portable applications must not depend on this." Thomas Munro (committer, portability expert) investigated and found this is a dramatic rendering of POSIX's escape hatch ("it is unspecified whether all members ... have meaningful values"). In practice FreeBSD's statvfs is a thin libc wrapper over statfs(2) — the same source df uses — so results are reliable. OpenBSD is the same; NetBSD has a real statvfs1 syscall with no disclaimer. Munro CI-tested NetBSD and OpenBSD and confirmed FreeBSD 14.2 manually. Conclusion: no reason to use the non-POSIX statfs() path; statvfs is fine universally.

Berg also cross-checked gnulib's fsusage.c wrapper, which only special-cases AIX, OSF/1, pre-2.6.36 glibc, and other defunct platforms — none currently supported by PostgreSQL (Laurenz Albe noted AIX support was dropped but may return; Munro confirmed AIX has statvfs anyway).

Error Handling Design

Munro pushed back on two points of the error discipline:

  • Silent -1 return on syscall failure. The initial patch mirrored calculate_tablespace_size()'s defensive pattern: return -1 (→ SQL NULL) rather than ereport. Munro argued this makes field debugging impossible, especially on Windows where without strace-equivalent tracing the user has no recourse. Berg looked more carefully and found calculate_tablespace_size actually does raise errors inside db_dir_size — it only tolerates a missing top-level directory. v4 switches to raising errors, exposing a real case: ERROR: could not statvfs directory ... Too many levels of symbolic links.

  • EINTR retry. Munro wondered if the statvfs call should loop on EINTR. Berg noted gnulib's coreutils df doesn't bother; Munro conceded that would require a filesystem that both sleeps in statvfs and ignores SA_RESTART, which is theoretical. Deferred until evidence arises.

  • Style: ! preferred over == false.

The Permissions Problem and the v5 psql Change

Once calculate_tablespace_avail raises real errors, \db+ becomes fragile for non-superusers. pg_tablespace_size has always had the same issue — a non-superuser lacking CREATE on pg_global gets an error from \db+. v5 fixes this at the psql layer by mirroring the access check inside the meta-command's query:

CASE WHEN dbsub.dattablespace = tblspc.oid
       OR has_tablespace_privilege(tblspc.oid, 'CREATE')
       OR pg_has_role('pg_read_all_stats', 'USAGE')
     THEN pg_size_pretty(pg_tablespace_avail(tblspc.oid))
     ELSE 'No Access' END

The three-way test (own default tablespace / CREATE privilege / pg_read_all_stats role) matches the existing authorization logic used elsewhere for pg_tablespace_size. Notably, pg_tablespace_size itself was not previously guarded in psql — v5 fixes both columns together. Berg also floated making the underlying functions return NULL on insufficient permissions instead of erroring, which would be a broader behavioral change and was not adopted.

Open Design Question: pg_wal

Berg's followup proposal is architecturally more interesting. pg_wal frequently lives on a dedicated volume (different IO characteristics, crash-safety concerns) but is not a tablespace — it's not in pg_tablespace, has no OID, and get_tablespace_location knows nothing about it. Three options on the table:

  1. Synthetic pg_wal row in pg_tablespace with a reserved WALTABLESPACE_OID. Cleanest from the user's perspective — \db+ "just works" — but requires defensive checks anywhere tablespace OIDs flow into DDL (CREATE TABLE ... TABLESPACE, ALTER TABLE SET TABLESPACE, pg_class.reltablespace, etc.). The blast radius is large.
  2. Dedicated pg_wal_size() / pg_wal_avail() functions. Minimal invasiveness, but users must know to call a different function, and \db+ would need a separate UNION branch.
  3. Reserved OID recognized only by the three introspection functions (get_tablespace_location, pg_tablespace_size, pg_tablespace_avail) without a catalog row. A middle ground — avoids catalog pollution and DDL-path defensive checks, but is a slight abstraction leak.

Berg leans between (2) and (3). This is unresolved in the thread.

Significance

The patch itself is small but hits a real gap. The technical core is almost entirely about (a) picking the right portable syscall and the right field within its result struct, and (b) getting the error/permission surface right so \db+ remains usable for non-superusers. Munro's review has mostly converged toward commit ("go for it, call statvfs and don't worry"). The pg_wal follow-up is where the more interesting design work remains.