Cart (Loading....) | Create Account
Close category search window
 

Accelerating MapReduce Analytics Using CometCloud

Sign In

Cookies must be enabled to login.After enabling cookies , please use refresh or reload or ctrl+f5 on the browser for the login options.

Formats Non-Member Member
$31 $13
Learn how you can qualify for the best price for this item!
Become an IEEE Member or Subscribe to
IEEE Xplore for exclusive pricing!
close button

puzzle piece

IEEE membership options for an individual and IEEE Xplore subscriptions for an organization offer the most affordable access to essential journal articles, conference papers, standards, eBooks, and eLearning courses.

Learn more about:

IEEE membership

IEEE Xplore subscriptions

4 Author(s)
AbdelBaky, M. ; NSF Autonomic & Cloud Comput. Center, Rutgers Univ., Piscataway, NJ, USA ; Hyunjoo Kim ; Rodero, I. ; Parashar, M.

MapReduce-Hadoop has emerged as an effective framework for large-scale data analytics, providing support for executing jobs and storing data in a parallel and distributed manner. MapReduce has been shown to perform very well on large datacenters running applications where the data can be effectively divided into homogeneous chunks running across homogeneous hardware. However, the performance of MapReduceHadoop is far from ideal when either or both hardware and datasets are heterogeneous. Such heterogeneity is unavoidable in many academic computing environments that use multiple generations of hardware, and share resources among users. Heterogeneity is also unavoidable in scientific applications that process a varying number of datasets of different sizes. In these cases, the performance of MapReduce-Hadoop can be a concern. In this paper, we implement MapReduce on top of CometCloud to address the issue of heterogeneity and support applications classes that involve irregular datasets (e.g. large number of small data files or datasets of varying sizes). Furthermore, we develop an autonomic manager that can schedule MapReduce tasks based on user objective, provision resources accordingly, and support on-demand scale up and cloudbursts. These resources can be selected from a hybrid infrastructure such as local clusters, data centers, and public clouds. The performance of the developed solution is verified using a protein data mining application operating on data from the Protein Data Bank. The application is deployed, based on deadline and budget constraints, on a cluster at Rutgers and/or Amazon EC2 resources. The experimental results show that the MapReduce-CometCloud framework can effectively support applications operating on large numbers of small data files on a heterogeneous and distributed environment, and satisfy user objective autonomically using cloudbursts.

Published in:

Cloud Computing (CLOUD), 2012 IEEE 5th International Conference on

Date of Conference:

24-29 June 2012

Need Help?


IEEE Advancing Technology for Humanity About IEEE Xplore | Contact | Help | Terms of Use | Nondiscrimination Policy | Site Map | Privacy & Opting Out of Cookies

A not-for-profit organization, IEEE is the world's largest professional association for the advancement of technology.
© Copyright 2014 IEEE - All rights reserved. Use of this web site signifies your agreement to the terms and conditions.