# Enhancing the Search in MOLAP Sparse Data

## Abstract

**:**

## 1. Introduction

## 2. Bitmap Compression

**Figure 2.**Generated bitmap matrix corresponding to Figure 1.

## 3. Using Binary Search Tree to Store Compressed Data

#### 3.1. Compact Indexing Strategy

^{′ }of the same size:

#### 3.2. Inserting a Cell Value into the BST

**Figure 4.**Binary search tree corresponding to the matrix of Figure 1.

#### 3.3. Searching a Cell Value in the BST

#### 3.4. Balancing the Tree to Reduce the Time Complexity

- (1) The same search algorithm presented for BST (Figure 5) is used to search an element in the AVL tree.
- (2) The delete algorithm is at the same complexity level as the insert algorithm. But we do not present a delete algorithm as we consider that data is usually appended to MOLAP, and as this latter is read-only, data are not deleted from there once added [13]. However, a refresh algorithm is taken into account to destroy all the BST by deleting all its nodes at once. This algorithm can be useful for fully refreshing the MOLAP.
- (3) The time complexity of operations in the worst, average and best cases is O(log(n)), where n is the number of existing non-null values in the multidimensional data cube.
- (4) The compact indexing strategy can be used to store the non-null values of multidimensional data cube in a B
^{+}-tree structure [14], in which the key index can serve as a key and the non-null value as a record content. Using this structure, the complexity of operations in all the case will be O(log_{m}(n)), where m is the order of the B^{+}-tree and n is the number of existing non-null values in the multidimensional data cube. Furthermore, for a better optimization, a clustered index can be constructed over the key indexes.

**Figure 5.**Balanced BST corresponding to the matrix of Figure 1.

## 4. Incremental Hashing

#### 4.1. Handling the Operations

#### 4.1.1. Insert Operation

#### 4.1.2. Search Operation

## 5. Empirical Results

3-D cube size | Classical Bitmap | Balanced BST | Incremental Hash |
---|---|---|---|

(Number of cells/facts) | Time (ms) | Time (ms) | Time (ms) |

10^{3} = 1000 | 17.1 | 16.21 | 15.24 |

20^{3} = 8000 | 53.73 | 17.28 | 16.98 |

30^{3} = 27,000 | 168.2 | 19.88 | 19.34 |

40^{3} = 64,000 | 328.4 | 21.63 | 21.67 |

50^{3} = 125,000 | 653.6 | 26.47 | 24.81 |

60^{3} = 216,000 | 993.2 | 30.01 | 27.12 |

70^{3} = 343,000 | 1394 | 34.12 | 31.23 |

80^{3} = 512,000 | 2570 | 50.76 | 35.22 |

90^{3} = 729,000 | 3047 | 69.20 | 40.03 |

100^{3} = 1,000,000 | 59,911 | 155.31 | 46.52 |

1000^{3} = 1,000,000,000 | 600,054 | 232.01 | 70.39 |

10^{16} | 52518054 | 2120 | 1412 |

## 6. Conclusions

## References and Notes

- Inmon, W.H. The data warehouse environment. In the Building the Data Warehouse, 3rd ed; John Wiley & Sons: Hoboken, NJ, USA, 2002; pp. 31–77. [Google Scholar]
- Kimball, R.; Ross, M. Dimensional modeling primer. In the Data Warehouse Toolkit: The Complete Guide to Dimensional Modeling, 2nd ed; John Wiley & Sons: Hoboken, NJ, USA, 2002; pp. 1–27. [Google Scholar]
- Li, J.; Rotem, D.; Wong, H. A new compression method with fast searching on large databases. In Very Large Data Bases: Proceedings of the Thirteenth International Conference on Very Large Data Bases, Brighton, England, September 1—4 1987; Morgan Kaufmann: San Francisco, CA, USA, 1987; pp. 311–318. [Google Scholar]
- Moffat, A.; Zobel, J. Parameterised compression for sparse bitmaps. In Proceedings of the 15th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, Copenhagen, Denmark, June 21–24, 1992; Belkin, N.J., Ingwersen, P., Pejtersen, A.M., Eds.; ACM Press: New York, NY, USA, 1992; pp. 274–285. [Google Scholar]
- Chan, C.Y.; Ioannidis, Y.E. Bitmap index design and evaluation. Sigmod Rec.
**1998**, 34, 355–366. [Google Scholar] - Vaidyanathan, J.K.; Yang, G.; Agrawal, G. Communication and memory optimal parallel data cube construction. IEEE Trans. Parallel Distrib. Syst.
**2005**, 16, 1105–1119. [Google Scholar] [CrossRef] - Ester, M.; Kohlhammer, J.; Kriegel, H.P. The DC-tree: A fully dynamic index structure for data warehouses. In Proceedings of the 16th International Conference on Data Engineering, San Diego, CA, USA, 29 February–3 March 2000; pp. 379–388.
- Allen, B.; Munro, I. Self-organizing binary search trees. J. ACM
**1978**, 25, 526–535. [Google Scholar] [CrossRef] - Cormen, T.H.; Leiserson, C.E.; Rivest, R.L.; Stein, C. Binary search trees. In Introduction to Algorithms, 2nd ed; MIT Press: Cambridge, MA, USA, 1990; pp. 253–272. [Google Scholar]
- Zalaket, J. Speed up the search in bitmap based compressed sparse arrays. In Proceedings of the International Conference on Information Management and Engineering, Kuala Lumpur, Malaysia, 3–5 April 2009; pp. 142–146.
- In this example, the indexes are arriving in an ascending order, but they can arrive in any order without affecting our goal which is obtaining a compressed structure.
- Adelson-Velskii, G.; Landis, E.M. An algorithm for the organization of information. Sov. Math. Dokl.
**1962**, 146, 1259–1263, Translated by Ricci, M.J.. [Google Scholar] - Data can be modified or deleted from dimension MOLAP tables, but here our balanced BST is representing facts which in general are not deleted directly from the fact table but canceled by adding negative entries when it is necessary.
- Elmasri, R.; Navathe, S. Fundamentals of Database Systems, 2nd ed; Addison Wesley: Boston, MA, USA, 2010; pp. 646–659. [Google Scholar]
- Fusco, F.; Vlachos, M.; Stoecklin, M. Real-time creation of bitmap indexes on streaming network data. VLDB J.
**2012**, 21, 287–307. [Google Scholar] [CrossRef] - Dichotomous search is applied into the sorted root vector in our implementation which has a logarithmic time compared with the linear time of the invoked one which is illustrated for simplicity reason.
- The compression ratio increases when we increase the amount of data. The same benchmarks of Table 1 are used for the calculation of compression ratio.

© 2012 by the authors; licensee MDPI, Basel, Switzerland. This article is an open-access article distributed under the terms and conditions of the Creative Commons Attribution license (http://creativecommons.org/licenses/by/3.0/).

## Share and Cite

**MDPI and ACS Style**

Zalaket, J.
Enhancing the Search in MOLAP Sparse Data. *Information* **2012**, *3*, 661-675.
https://doi.org/10.3390/info3040661

**AMA Style**

Zalaket J.
Enhancing the Search in MOLAP Sparse Data. *Information*. 2012; 3(4):661-675.
https://doi.org/10.3390/info3040661

**Chicago/Turabian Style**

Zalaket, Joseph.
2012. "Enhancing the Search in MOLAP Sparse Data" *Information* 3, no. 4: 661-675.
https://doi.org/10.3390/info3040661