Skip to main content

Understanding database [3] : Buffer algorithms

Understanding database 3 : Buffer algorithms

As we use buffers to boost up our performance, it is also necessary for us to choose a suitable algorithm for our buffer to follow. With formal expression, the access pattern is the key of ensuring the number of I/Os to be controlled, which has a massive impact on the performance of execution.

We introduce the formula for read efficiency here, noted as
(number of page requests - physical I/Os) /(number of page requests)
And we should also aim for above 0 .8 or 80% for an adequate efficiency.

We have 4 basic policies to decide for replacing frames within a buffer, including

First in first out (FIFO)
->oldest leaves
Least recently used (LRU)          <--------------------------- most common
->frame with oldest access leaves
Most recently used (MRU)
->frame with freshest access leaves
Least frequently used (LFU)
->frame with least access count leaves

Most database management systems would use a variant of LRU called the CLOCK, where the CLOCK would keep a circular list (consider the pages are the times on the clock), and also a hand pointing to the oldest page on the clock. Each of the frames would have a bit variable that is initially 0, but will be set to 1 on each page request to the frame. When it comes to the time when the clock needs to kick out a page, the first page with a 0 for their bit will be chosen. If the frame we are iterating through has a bit of 1, we will set its bit to 0 after sweeping through the frame.

An elaboration on the CLOCK method is the generalized CLOCK, namely the GCLOCK, where instead of setting an upper limit of 0 for the frame, each request will increment the bit by 1 and sweep of hand will decrement the bit. Similarly, for a frame to be replaced, the bit has to be 0.

If the requests can be predicted, for instance, a sequential scan will have a confined pattern, pages may be pre-fetched by many at the same time

Short topic : OS File System

The reason why DBMS takes over instead of OS for the disk space and buffer management is that OS may have significant portability issues. 


Popular posts from this blog

Understanding database [6] : Clustered Index

A clustered index is what is good for a range search over a range of search key values, and the index entries and the rows are ordered the same way, which is different from unclustered indexes (secondary indexes). To use a clustered index, we use the index to locate the first index entry at the start of the range, which where the first row is located at. If the index is clustered, subsequent rows will be stored in successive locations with the ordering , therefore it may minimize the page transfers and maximizes cache hits. For one table, there may only be one clustered index, whereas there can be as many unclustered indexes as you may want  to create. Unclustered indexes aren ’ t as good as clustered, but they might become necessary when it comes to finding for other attributes apart from the primary key. Smaller topics : Dense index & Sparse index When we say sparse index, we mean that there is an index entry for each page of the data file. With this structuring

Understanding database [9] : Choosing indexes

Understanding database 9 : Choosing indexes When choosing indexes, we choose the best plan that suits for the queries, and look for additional indexes that may potentially upgrade upon that. Before creating, we must also consider the impact on updates in the workload, such that indexes take disk space. For a query, the WHERE clause are the main focus point to make indexes on, where exact matches suggest a hash index and range queries suggest a tree index. Clustering is extremely helpful when it comes to range queries, and may also help with equality queries if there are duplicates. Search keys with multiple attribute should be considered if a WHERE clause contains multiple conditions, and the order of attributes is important for range queries. Searching may become ‘index-only’ with such indexes.