By Topic

Data Cube Materialization and Mining over MapReduce

Sign In

Cookies must be enabled to login.After enabling cookies , please use refresh or reload or ctrl+f5 on the browser for the login options.

Formats Non-Member Member
$33 $13
Learn how you can qualify for the best price for this item!
Become an IEEE Member or Subscribe to
IEEE Xplore for exclusive pricing!
close button

puzzle piece

IEEE membership options for an individual and IEEE Xplore subscriptions for an organization offer the most affordable access to essential journal articles, conference papers, standards, eBooks, and eLearning courses.

Learn more about:

IEEE membership

IEEE Xplore subscriptions

4 Author(s)
Arnab Nandi ; The Ohio State University, Columbus ; Cong Yu ; Philip Bohannon ; Raghu Ramakrishnan

Computing interesting measures for data cubes and subsequent mining of interesting cube groups over massive data sets are critical for many important analyses done in the real world. Previous studies have focused on algebraic measures such as SUM that are amenable to parallel computation and can easily benefit from the recent advancement of parallel computing infrastructure such as MapReduce. Dealing with holistic measures such as TOP-K, however, is nontrivial. In this paper, we detail real-world challenges in cube materialization and mining tasks on web-scale data sets. Specifically, we identify an important subset of holistic measures and introduce MR-Cube, a MapReduce-based framework for efficient cube computation and identification of interesting cube groups on holistic measures. We provide extensive experimental analyses over both real and synthetic data. We demonstrate that, unlike existing techniques which cannot scale to the 100 million tuple mark for our data sets, MR-Cube successfully and efficiently computes cubes with holistic measures over billion-tuple data sets.

Published in:

IEEE Transactions on Knowledge and Data Engineering  (Volume:24 ,  Issue: 10 )