Skip to main content


Showing posts from November, 2017

Distributed systems algorithms : simplified explanation

Cristian's algorithm: I send you my time You get it and record your own time You send me back the time you recorded it and the time you send back I record the time I receive it me -(T1)> you you -(T2,T3)> me me -(T4)> me Berkeley Algorithm: Teacher shouts out what time is it Kids reply with their own time Take average and tell kids to follow this time

Understanding database [9] : Choosing indexes

Understanding database 9 : Choosing indexes When choosing indexes, we choose the best plan that suits for the queries, and look for additional indexes that may potentially upgrade upon that. Before creating, we must also consider the impact on updates in the workload, such that indexes take disk space. For a query, the WHERE clause are the main focus point to make indexes on, where exact matches suggest a hash index and range queries suggest a tree index. Clustering is extremely helpful when it comes to range queries, and may also help with equality queries if there are duplicates. Search keys with multiple attribute should be considered if a WHERE clause contains multiple conditions, and the order of attributes is important for range queries. Searching may become ‘index-only’ with such indexes.

Understanding database [8] : Bitmap indexes

Understanding database 8  : Bitmap indexes The idea of a bitmap indexes is to record values of sparse columns as a sequence of bits, where one bit would represent for each possible value. The bit for the row, would indicate whether if it has this value or not. Formally, we may consider it as a collection of n bit-vectors, where we have one for each possible value f. Six vector operations are supported by bitmap indexes, namely COUNT, LENGTH,OR,AND, NOT and XOR. In order to find the correct bitmap vector for a given value ,we may build a B+ tree over the possible values, with the leaves storing the index entries. The modification of bitmap index based data needs to be dealt with caution : the record numbers must remain fixed once assigned, and the changes to the data file require the bitmap to change as well by setting or clearing corresponding bits in vectors. If a new record is to introduce a new value, we may have to introduce a new bitmap vector for that value. As we ma

Understanding database [7] : Hash based indexes

Understanding database 7 : Hash based indexes When we are to implement hash based indexes, it is key that we should partition our data into buckets. As for the rule to allocate entries, we use each entry ’ s search key as its bucket id to partition, where each bucket will hold entries that share a same property. In order to determine what is that property we have chosen to partition with, we will have to introduce a hash function h(v), where v ranges over all the search key values. We will consider each bucket as a unit of storage containing one or more records that is stored in a page. It is often considered better to use hash based indexes for equality searches, and it is unable to support range searches due to the use of buckets. The steps of a equality search with hash index is as below : 1.  Given v value, compute with h(v) = a 2.  Retrieve the bucket a 3.  Search the wanted entry from the bucket As we may see from above, the cost of searching is equivalent to

Understanding database [6] : Clustered Index

A clustered index is what is good for a range search over a range of search key values, and the index entries and the rows are ordered the same way, which is different from unclustered indexes (secondary indexes). To use a clustered index, we use the index to locate the first index entry at the start of the range, which where the first row is located at. If the index is clustered, subsequent rows will be stored in successive locations with the ordering , therefore it may minimize the page transfers and maximizes cache hits. For one table, there may only be one clustered index, whereas there can be as many unclustered indexes as you may want  to create. Unclustered indexes aren ’ t as good as clustered, but they might become necessary when it comes to finding for other attributes apart from the primary key. Smaller topics : Dense index & Sparse index When we say sparse index, we mean that there is an index entry for each page of the data file. With this structuring

Understanding database [5] : Tree based indexes

As we have mentioned in the previous post, there exists a variety of indexes, and we are here to discuss Tree based indexes first (others to be posted later), continuing from the end of the previous post. For the ISAM (Indexed Sequential Access Method) variety, there exists a type of entry being called as the separator entry. Each separator entry is a 2-tuple (k,p) where k is a search key value, meanwhile p is a pointer to a lower level page. The search key value k will separate set of search key values in the 2 subtrees pointed at p and p-1, where p-1 is considered to be located before p according to their ordering. The leaf pages are initially to be stored sequentially, and they generally contain the data records (primary/integrated/main index). The separator levels will never change once they have been constructed, by that it means that the indexes are static for ISAM.  On top of that, the number and position of leaf pages in the file always remains the same.  As a resu

Understanding database [4] : Indexes

Understanding database 4   : Indexes As we have covered with little mentioning in the previous posts, indexes are crucial for the upgrade for performance of databases. There are 3 main types of indexes, including Tree-based indexes (ISAM, B+ tree), Hash based indexes (Extendible hashing, linear hashing) and bitmap indexes. By definition, we consider index as a ‘a set of pages with pointers to the data page which contains the value’, namely we use a separate file for the ease of efficient accessing to various files. This upgrades the efficiency as we do not have to scan the entire relation (built up by tables etc) every time, but we will only have to look up the indexes which is much less data to search through. We may consider the use of index is more efficient if (index lookup cost + record retrieval cost) < file scan cost and we will make indexing decisions based on this relation. We create an index in SQL with the format CREATE INDEX ON <Table> (<attri