By Topic

vHadoop: A Scalable Hadoop Virtual Cluster Platform for MapReduce-Based Parallel Machine Learning with Performance Consideration

Sign In

Cookies must be enabled to login.After enabling cookies , please use refresh or reload or ctrl+f5 on the browser for the login options.

Formats Non-Member Member
$31 $13
Learn how you can qualify for the best price for this item!
Become an IEEE Member or Subscribe to
IEEE Xplore for exclusive pricing!
close button

puzzle piece

IEEE membership options for an individual and IEEE Xplore subscriptions for an organization offer the most affordable access to essential journal articles, conference papers, standards, eBooks, and eLearning courses.

Learn more about:

IEEE membership

IEEE Xplore subscriptions

6 Author(s)
Kejiang Ye ; Coll. of Comput. Sci., Zhejiang Univ., Hangzhou, China ; Xiaohong Jiang ; Yanzhang He ; Xiang Li
more authors

Big data processing is currently becoming increasingly important in modern era due to the continuous growth of the amount of data generated by various fields such as particle physics, human genomics, earth observation, etc. However, the efficiency of processing large-scale data on modern virtual infrastructure, especially on the virtualized cloud computing infrastructure, is not clear. This paper focuses on the performance of hadoop virtual cluster and proposes a scalable hadoop virtual cluster platform vHadoop for the large-scale MapReduce-based parallel data processing. We first describe the design and implementation of vHadoop platform. Then we perform a series of experiments to investigate both the static and dynamic performance of vHadoop platform, such as the performance characterization of cross-domain hadoop virtual cluster and live migraiton of hadoop virtual cluster. After that, we use the vHadoop platform to process 6 typical parallel clustering algorithms, such as Canopy, Dirichlet, Fuzzy k-Means, k-Means, Mean Shift, MinHash, etc, on two typical datasets. Experimental results verify the efficiency of vHadoop platform to process the MapReduce-based parallel machine learning applications.

Published in:

Cluster Computing Workshops (CLUSTER WORKSHOPS), 2012 IEEE International Conference on

Date of Conference:

24-28 Sept. 2012