Skip to main content

Understanding database [4] : Indexes

Understanding database 4 : Indexes

As we have covered with little mentioning in the previous posts, indexes are crucial for the upgrade for performance of databases. There are 3 main types of indexes, including Tree-based indexes (ISAM, B+ tree), Hash based indexes (Extendible hashing, linear hashing) and bitmap indexes.

By definition, we consider index as a ‘a set of pages with pointers to the data page which contains the value’, namely we use a separate file for the ease of efficient accessing to various files. This upgrades the efficiency as we do not have to scan the entire relation (built up by tables etc) every time, but we will only have to look up the indexes which is much less data to search through. We may consider the use of index is more efficient if
(index lookup cost + record retrieval cost) < file scan cost
and we will make indexing decisions based on this relation.

We create an index in SQL with the format
CREATE INDEX ON <Table> (<attributes1>...)
and this allows us to create a customized index of our wanted attributes from one table that we would like to manipulate.

Typically, each index entry for a data record is a row, where each entry contains a search key value and pointer to a record in the table. The retrieval of records are based on the corresponding search key value, and the pointer references to the actual data record in the database.

Although we could grant a rather efficient look up performance for sorted files, it is still rather expensive for large data to update or to make access after all. As a solution, we introduce sparse indexes for the data file, and we give each page’s first entry a sparse index in order to allow for a further improved binary search. With the idea being recursively applied, we may able to generate Tree based indexes where we have non leaf levels that has its entries as pointers to index pages, and leaves as entries pointing to actual records. This is how we construct the tree based indexes that we have mentioned in the first paragraph. 


Popular posts from this blog

Understanding database [6] : Clustered Index

A clustered index is what is good for a range search over a range of search key values, and the index entries and the rows are ordered the same way, which is different from unclustered indexes (secondary indexes). To use a clustered index, we use the index to locate the first index entry at the start of the range, which where the first row is located at. If the index is clustered, subsequent rows will be stored in successive locations with the ordering , therefore it may minimize the page transfers and maximizes cache hits. For one table, there may only be one clustered index, whereas there can be as many unclustered indexes as you may want  to create. Unclustered indexes aren ’ t as good as clustered, but they might become necessary when it comes to finding for other attributes apart from the primary key. Smaller topics : Dense index & Sparse index When we say sparse index, we mean that there is an index entry for each page of the data file. With this structuring

Understanding database [9] : Choosing indexes

Understanding database 9 : Choosing indexes When choosing indexes, we choose the best plan that suits for the queries, and look for additional indexes that may potentially upgrade upon that. Before creating, we must also consider the impact on updates in the workload, such that indexes take disk space. For a query, the WHERE clause are the main focus point to make indexes on, where exact matches suggest a hash index and range queries suggest a tree index. Clustering is extremely helpful when it comes to range queries, and may also help with equality queries if there are duplicates. Search keys with multiple attribute should be considered if a WHERE clause contains multiple conditions, and the order of attributes is important for range queries. Searching may become ‘index-only’ with such indexes.