Seeder Question: I have seen a lot of references to 'pages' when reading information about SQL Server - what exactly is a page, and how does it relate to how my databases work?
asked Oct 26, 2009 at 12:14 PM in Default
Matt Whitfield ♦♦
A page is the fundamental unit of storage in MS-SQL, used by the database engine to manage I/O from disk to RAM and back. Currently the page size is 8k. In other words, whenever data is read from the hard drive or written to the hard drive it is done in 8k increments. Pages are grouped into extents, which are the are the basic unit in which space is allocated to tables and indexes. Each extent consists of 8 pages or 64k.
As a practical matter, we always seek to minimize phsical I/O operations because they take place at mechanical speed, while logical operations in memory run at much faster electronic speed.
Whenever the rows of a result set can be found on the same page, they can be processed optimally, because only one physical I/O operation will be necessary. After the page is in memory, all reads will be fast logical reads from row to row, in memory. Similarly if rows being updated or inserted all lie on the same page, only one physical write is required.
This leads to a number of design considerations. First, we want keep rows as small as possible, so that more rows can fit on a single page. Among other things, this involves normalisation to omit unnecessary columns and selection of appropriate datatypes that take up less space in memory.
Second, some thought must be given to the sequence of the data when it's being retrieved. A classic example is that of an order with a number of line items. A very common way of retrieving this data is to get all line items for a specific order number. If the clustered index is created for the an OrderDetail table on (orderID, lineItemID), the rows of a particular order are more likely to fall together on a single page, or at worst adjacent pages. "Covering" indexes may also be created to group rows logically to satisfy commonly used queries.
However, this grouping can also slow performance down as the pages become fragmented. Fragmentation occurs when a row has to be inserted into a page which is already full, or when an update to a row makes the row too large to fit in the available space on a page.
In such a situation, the page is split into two different pages, so that empty space is available for the new or updated row. This requires additional work by the system to perform the insert or update. This also means that following the insert/update there will be fewer rows per page on average, therefore more physical I/O is required.
For this reason, tables and indexes must be periodically reorganized or rebuilt when the fragmentation percentage gets too high.