PostgreSQL Page Structure - (Slotted Pages)

From the previous blog we knew that tables are made up of 8KB pages, lets crack open a page and see what's really inside it. This is where the things get interesting because jargons like MVCC, tuple storage, visiblity rules, freespace is dependent on concept of pages.

This two-ended growth pattern is elegant. Line pointers grow down from the top, tuples grow up from the bottom, and free space is always the contiguous region in the middle. Two pointers in the page header—pd_lower and pd_upper—track the boundaries. Free space is simply pd_upper - pd_lower. Want to know if a new tuple fits? It's an O(1) subtraction.

Description Of Important Headers

Log Sequence Number (LSN)

Log Sequence Number comes first, occupying 8 bytes. every time the page is modified, postgreSQL writes a record to WAL (write ahead log) and assigns it a unique identifier (LSN). That LSN is copied into the page header. The. main reason to do so is for crash recovery. When PostgreSQL crashes and restarts, it replays the WAL starting from the last checkpoint. For each WAL record, it reads the corresponding page from disk and checks: is the page's LSN less than this WAL record's LSN? If yes, apply the change. If no, skip it—the page already has this change.

Checksum

The checksum header occupies 2 bytes. if you initialize your cluster with initdb --data-checksums postgreSQL calculates a checksum over the entire page every time it writes to the disk. when reading the page back, it recalculates and compares. if they don't match then scary error may come. The main downside is about 2-5 percent of cpu overhead for the checksum calculation and comparision.

Flag

Flag header occupies 2 bytes too. it is a bitmap of boolean properties about the page. The most important flag is PD_ALL_VISIBLE , which mirrors the visiblity map and tell us that every page in this tuple is visible to all transactions.

pd_lower and pd_upper

Both of the header takes 2 bytes. pd_lower points to the start of the free space area or end of the line pointer array where pd_upper points to the start of the start of the tuple area or end of the freespace area. when new tuple is inserted pd_upper moves down where as pd_lower moves up (towards higher page address)

pd_special field

The pd_special (2 bytes) points to a special area at the end of the page. For heap pages, this is unused and just points to byte 8192 (the end of the page). But for index pages, it points to index-specific metadata. B-tree pages, for example, store pointers to left and right sibling pages in the special area, plus the tree level. This allows the page header format to be generic while still supporting specialized needs.

pd_pagesize_version

The pd_pagesize_version field (2 bytes) encodes both the page size and the page layout version. This is important for pg_upgrade and for debugging. If PostgreSQL loads a page and sees an unexpected version number, it knows the page might have been written by an older or newer version with a different tuple layout. This prevents silent corruption when mixing versions.

pd_prune_xid

Finally, pd_prune_xid (4 bytes) stores the oldest transaction ID that deleted or updated a tuple on this page. This is used for HOT (Heap-Only Tuple) pruning, which we'll cover in depth in the MVCC module. For now, just know that it's an optimization hint that helps PostgreSQL decide whether it's worth trying to clean up dead tuples on this page.

Line Pointers : The Indirection Layer

The size of each line pointers are exactly 4 bytes and point towards the some tuple within the page. This level of indirection is absolutely fundamental to how postgreSQL works.

When i was learning this concept first in university, i was overwhelmed by the concept of slotted array (Line Pointers Array), its usage and all. But i got intuition after i took the database course by Andy Palvo (CMU). Okay now answer the question Why indirection? Consider what happens without it. An index entry for a B-tree index contains a pointer to a heap tuple. If that pointer is a direct byte offset into the page—say, "the tuple is at byte 1000"—then what happens if we need to move the tuple within the page? Maybe we're doing a HOT update, or maybe we're compacting the page to defragment free space. If we move the tuple, we have to update every index entry that points to it. That's expensive and complex.

Line pointers solve this. An index entry doesn't point to byte 1000. It points to "page 5, line pointer 3." The line pointer itself points to byte 1000. If we need to move the tuple, we just update the line pointer. All the indexes remain valid without any changes. This is why line pointers are sometimes called "item pointers" or "item IDs"—they're stable identifiers for tuples.

Each Line Pointer Packs Three Pieces of Information

typedef struct ItemIdData {
    unsigned lp_off:15;     /* Offset to tuple (0-32767) */
    unsigned lp_flags:2;    /* Status flags (4 states) */
    unsigned lp_len:15;     /* Tuple length (0-32767) */
} ItemIdData;

lp_off is also an pointer that stores starting address of specific tuple

lp_len stores the length of the tuple in bytes

lp_flags is where things get interesting. with 2 bits we can represent 4 different states.

lp_flags internals

lp_unused (0) means this line pointer has never been used, or it has been used but reclaimed by the VACUUM. it is available for reuse.

lp_normal (1) means this line pointer points to an actual tuple. tuple might be live or dead version waiting till VACUUM. we cannot tell from the line pointer alone - we have to look header of that particular tuple.

lp_redirect (2) is special. It means this line pointer doesn't point to tuple at all. Instead, it points to another line pointer. This happens during HOT updates. When a tuple is updated in a way that doesn't affect any indexed columns, PostgreSQL can create the new version on the same page and set up a redirect: "The tuple you're looking for is now at line pointer 5." Indexes still point to the original line pointer, and they automatically follow the redirect. This saves having to update every index.

lp_dead (3) means the tuple this line pointer used to point to is definitely dead—no transaction can see it anymore—but VACUUM hasn't reclaimed the space yet. This is useful during index scans. If an index points to a dead line pointer, we can immediately skip it without having to fetch and examine the tuple.

Understanding line pointer states is essential for understanding tuple lifecycle and MVCC. A common mistake is thinking that when you delete a row, it's immediately removed. It's not. The tuple stays right where it is. The line pointer might eventually be marked LP_DEAD, but even then, the tuple data is still there, occupying space, until VACUUM runs.

CREATE EXTENSION pageinspect;
CREATE TABLE lp_demo (id INT, data TEXT);

INSERT INTO lp_demo VALUES (1, 'version1');

SELECT lp, lp_off, lp_flags, lp_len 
FROM heap_page_items(get_raw_page('lp_demo', 0));

Again

UPDATE lp_demo SET data = 'version2' WHERE id = 1;

SELECT lp, lp_off, lp_flags, lp_len 
FROM heap_page_items(get_raw_page('lp_demo', 0));

now observe the differences between the output. In the next blog we will again make our hands dirty with tuple layout.

PostgreSQL Page Structure - (Slotted Pages)

Description Of Important Headers

Line Pointers : The Indirection Layer

Each Line Pointer Packs Three Pieces of Information

lp_flags internals

Again

Comments

Demystifying Postgres

Tuple Layout

More from this blog

Understanding HTTP MIME Types

Centralized Cache Key Management In Redis

Postgres Multi Version Concurrency Control - MVCC

PostgreSQL TOAST Storage Models

Tuple Layout

Command Palette

Description Of Important Headers

Line Pointers : The Indirection Layer

Each Line Pointer Packs Three Pieces of Information

lp_flags internals

Again

Comments

Demystifying Postgres

Tuple Layout

More from this blog