Skip to Main Content
Data compression is an effective technique to improve the performance of data warehouses. Aggregation and cube are important operations for on-line analytical processing (OLAP). It is a major challenge to develop efficient algorithms for aggregation and cube operations on compressed data warehouses. Many efficient algorithms to compute aggregation and cube for relational OLAP have been developed. Some work has been done on efficiently computing aggregation and cube for multidimensional data warehouses (MDWs) that store datasets in multidimensional arrays rather than in tables. However, to our knowledge, there is few to date in the literature describing aggregation algorithms on compressed data warehouses for multidimensional OLAP. The goal of this paper is to develop efficient algorithms to compute aggregation and cube on compressed MDWs,. For aggregation operations, four algorithms are proposed in this paper. These algorithms operate directly on compressed datasets, which are compressed by the mapping-complete compression methods, without the need to first decompress them. The algorithms have different performance behaviors as a function of the dataset parameters, sizes of outputs and main memory availability. The algorithms are described and the I/O and CPU cost functions are presented in this paper. A decision procedure to select the most efficient algorithm for a given aggregation request is also proposed. The analysis and experimental results show that the algorithms have better performance on sparse data than the previous aggregation algorithms. For cube operations, this paper presents a novel algorithm to compute cubes on compressed data warehouses. The proposed algorithm also operates directly on compressed datasets without the need of first decompressing them. The algorithm is applicable to a large class of mapping complete data compression methods. The complexity of the algorithm is analyzed in detail. The analytical and experimental results show that the - - algorithm is more efficient than all other existing cube algorithms. In addition, a heuristic algorithm to generate an optimal plan for computing cube on data warehouses is also proposed in the paper. In conclusion, direct manipulation of compressed data is an important tool for managing very large data warehouses. Aggregation and cube are just two (and important) such operation in this direction. Additional algorithms will be needed for OLAP on compressed multidimensional data OLAP on compressed multidimensional data warehouses. We are currently working on algorithms for other operations on compressed MDWs,. We are also working on algorithms for OLAP operations applicable to other kinds of compression methods other than mapping-complete compression methods.
Date of Conference: 18-21 Sept. 2007