54.1. Database File Layout

This section describes the storage format at the level of files and directories.

All the data needed for a database cluster is stored within the cluster's data directory, commonly referred to as PGDATA (after the name of the environment variable that can be used to define it). A common location for PGDATA is /var/lib/pgsql/data. Multiple clusters, managed by different server instances, can exist on the same machine.

The PGDATA directory contains several subdirectories and control files, as shown in Table 54-1. In addition to these required items, the cluster configuration files postgresql.conf, pg_hba.conf, and pg_ident.conf are traditionally stored in PGDATA (although in PostgreSQL 8.0 and later, it is possible to keep them elsewhere).

Table 54-1. Contents of PGDATA

ItemDescription
PG_VERSIONA file containing the major version number of PostgreSQL
baseSubdirectory containing per-database subdirectories
globalSubdirectory containing cluster-wide tables, such as pg_database
pg_clogSubdirectory containing transaction commit status data
pg_multixactSubdirectory containing multitransaction status data (used for shared row locks)
pg_notifySubdirectory containing LISTEN/NOTIFY status data
pg_stat_tmpSubdirectory containing temporary files for the statistics subsystem
pg_subtransSubdirectory containing subtransaction status data
pg_tblspcSubdirectory containing symbolic links to tablespaces
pg_twophaseSubdirectory containing state files for prepared transactions
pg_xlogSubdirectory containing WAL (Write Ahead Log) files
postmaster.optsA file recording the command-line options the server was last started with
postmaster.pidA lock file recording the current server PID and shared memory segment ID (not present after server shutdown)

For each database in the cluster there is a subdirectory within PGDATA/base, named after the database's OID in pg_database. This subdirectory is the default location for the database's files; in particular, its system catalogs are stored there.

Each table and index is stored in a separate file, named after the table or index's filenode number, which can be found in pg_class.relfilenode. In addition to the main file (a/k/a main fork), each table and index has a free space map (see Section 54.3), which stores information about free space available in the relation. The free space map is stored in a file named with the filenode number plus the suffix _fsm. Tables also have a visibility map, stored in a fork with the suffix _vm, to track which pages are known to have no dead tuples. The visibility map is described further in Section 54.4.

Caution

Note that while a table's filenode often matches its OID, this is not necessarily the case; some operations, like TRUNCATE, REINDEX, CLUSTER and some forms of ALTER TABLE, can change the filenode while preserving the OID. Avoid assuming that filenode and table OID are the same. Also, for certain system catalogs including pg_class itself, pg_class.relfilenode contains zero. The actual filenode number of these catalogs is stored in a lower-level data structure, and can be obtained using the pg_relation_filenode() function.

When a table or index exceeds 1 GB, it is divided into gigabyte-sized segments. The first segment's file name is the same as the filenode; subsequent segments are named filenode.1, filenode.2, etc. This arrangement avoids problems on platforms that have file size limitations. (Actually, 1 GB is just the default segment size. The segment size can be adjusted using the configuration option --with-segsize when building PostgreSQL.) In principle, free space map and visibility map forks could require multiple segments as well, though this is unlikely to happen in practice. The contents of tables and indexes are discussed further in Section 54.5.

A table that has columns with potentially large entries will have an associated TOAST table, which is used for out-of-line storage of field values that are too large to keep in the table rows proper. pg_class.reltoastrelid links from a table to its TOAST table, if any. See Section 54.2 for more information.

Tablespaces make the scenario more complicated. Each user-defined tablespace has a symbolic link inside the PGDATA/pg_tblspc directory, which points to the physical tablespace directory (as specified in its CREATE TABLESPACE command). The symbolic link is named after the tablespace's OID. Inside the physical tablespace directory there is a subdirectory for each database that has elements in the tablespace, named after the database's OID. Tables within that directory follow the filenode naming scheme. The pg_default tablespace is not accessed through pg_tblspc, but corresponds to PGDATA/base. Similarly, the pg_global tablespace is not accessed through pg_tblspc, but corresponds to PGDATA/global.

The pg_relation_filepath() function shows the entire path (relative to PGDATA) of any relation. It is often useful as a substitute for remembering many of the above rules. But keep in mind that this function just gives the name of the first segment of the main fork of the relation — you may need to append a segment number and/or _fsm or _vm to find all the files associated with the relation.

Temporary files (for operations such as sorting more data than can fit in memory) are created within PGDATA/base/pgsql_tmp, or within a pgsql_tmp subdirectory of a tablespace directory if a tablespace other than pg_default is specified for them. The name of a temporary file has the form pgsql_tmpPPP.NNN, where PPP is the PID of the owning backend and NNN distinguishes different temporary files of that backend.