Understanding database [8] : Bitmap indexes

Understanding database 8 : Bitmap indexes

The idea of a bitmap indexes is to record values of sparse columns as a sequence of bits, where one bit would represent for each possible value. The bit for the row, would indicate whether if it has this value or not. Formally, we may consider it as a collection of n bit-vectors, where we have one for each possible value f.

Six vector operations are supported by bitmap indexes, namely COUNT, LENGTH,OR,AND, NOT and XOR. In order to find the correct bitmap vector for a given value ,we may build a B+ tree over the possible values, with the leaves storing the index entries. The modification of bitmap index based data needs to be dealt with caution : the record numbers must remain fixed once assigned, and the changes to the data file require the bitmap to change as well by setting or clearing corresponding bits in vectors. If a new record is to introduce a new value, we may have to introduce a new bitmap vector for that value.

As we may derive, we will need O(mn) space for the bitmap index, while m stands for the number of distinct values for field f, and n stands for the number of records. The problem is that the more distinct values we have, the sparser the bitmap vectors will be, and we will end up storing heaps of 0s in the vectors. For that reason, it would be a good idea to compress bitmap vectors for efficient storage.


Popular Posts