A slotted page is a fixed-size page (or block) in a database's storage layer that is designed to efficiently manage and organize variable-length records.
This structure is a cornerstone of many database management systems (DBMS), including PostgreSQL, because it addresses the complexities of inserting, updating, and deleting records of different sizes without causing excessive fragmentation or requiring expensive data shuffling.
Core components
A slotted page is typically divided into three main regions:
- Header: Located at the beginning of the page, the header stores metadata about the page itself. Key information includes:
- Free space pointers: These define the boundaries of the free space region within the page. In some implementations,
pd_lowerpoints to the start of the free space (growing upward toward the data), andpd_upperpoints to the end of the free space (growing downward). - Specialized metadata: Other information includes the page's ID, checksums, versioning details, and flags for things like compression.
- Free space pointers: These define the boundaries of the free space region within the page. In some implementations,
- Slot array (or ItemIdData): This array of fixed-size pointers or offsets is located after the header and grows toward the center of the page. Each entry in the slot array points to the starting location of a record in the data area. Because the slot size is fixed (e.g., 4 bytes in PostgreSQL), the slot array can be easily searched or manipulated.
- Data area: This region contains the actual data records (or tuples). It starts from the end of the page and grows toward the center, moving opposite the slot array. Records in this area are not necessarily in any particular order. The space between the slot array and the data area is the page's free space.
Operations and advantages
The slotted page structure provides an elegant solution to several challenges in database storage.
Handling variable-length records
- When a new record is inserted, it is placed in the free space at the end of the page, and a new slot is added to the slot array pointing to it.
- This approach avoids the need to maintain a separate, fixed-size area for each record, which would lead to wasted space for smaller records.
Deletions
- When a record is deleted, its corresponding slot is simply marked as free or removed from the slot array.
- The record's data is not immediately deleted but instead becomes a gap in the data area. This gap can later be reclaimed during compaction or reused by a newly inserted record that fits the space.
- This prevents the need for an expensive page-wide reorganization, which would involve moving every subsequent record just to fill the space.
Updates
- Growing record: If an updated record becomes larger than its original space, the new record is typically inserted elsewhere on the page, or even on a new overflow page, leaving a forwarding pointer in the original location.
- Shrinking record: If an updated record becomes smaller, the change is applied in place, leaving a small gap of free space that can be reclaimed later. The slot pointer is updated to reflect the new, smaller size, and no data needs to be moved.
Logical vs. physical separation
- Slotted pages decouple the logical order of records from their physical storage location.
- A record's identifier (or tuple ID) is typically a combination of its page number and its slot number. Because the slot number remains constant even if the record's physical offset on the page changes, indexes can always reliably find a record by its stable tuple ID.
Compaction and fragmentation management
- Over time, repeated insertions and deletions can lead to internal fragmentation, with free space distributed in small, unusable gaps.
- To address this, DBMSs can perform periodic compaction (or "vacuuming" in PostgreSQL). This process reorganizes the data area, moving records to eliminate gaps and consolidate all free space into one contiguous block. The slot array is then updated to reflect the new offsets.
- Because compaction only requires updating the lightweight slot array, it is a far more efficient operation than managing data in a structure that does not use slots.
Performance considerations
The slotted page design offers significant performance benefits but also comes with some trade-offs:
Advantages
- Efficient space utilization: The structure effectively handles variable-length records, minimizing wasted space due to unused record-specific padding.
- Fast record access: The slot array provides a quick way to locate any record on a page. Since the slots are of fixed size, it is possible to perform a binary search on the array to find a specific record if the slots are kept sorted.
- Simplified management: Deletions and updates do not require shuffling large blocks of data, making these operations more efficient.
- Foundation for advanced features: The slotted page provides a robust foundation for building more complex database features, including B-tree indexes and Multi-Version Concurrency Control (MVCC).
Disadvantages
- Read overhead: A lookup requires a double dereference: first reading the slot entry and then reading the record's data. For simple, fixed-length record structures, this might be a single dereference.
- Fragmentation management: While easy to manage, fragmentation still occurs and may require periodic compaction, which consumes resources.
- Poor memory locality for large scans: The data area's fragmented nature can lead to poor memory locality when scanning large portions of the table, as records may not be stored contiguously in a way that aligns with the scan order.
Enjoyed this article? Share it with a friend.