By Topic

Optimal Algorithms for Cross-Rack Communication Optimization in MapReduce Framework

Sign In

Cookies must be enabled to login.After enabling cookies , please use refresh or reload or ctrl+f5 on the browser for the login options.

Formats Non-Member Member
$33 $13
Learn how you can qualify for the best price for this item!
Become an IEEE Member or Subscribe to
IEEE Xplore for exclusive pricing!
close button

puzzle piece

IEEE membership options for an individual and IEEE Xplore subscriptions for an organization offer the most affordable access to essential journal articles, conference papers, standards, eBooks, and eLearning courses.

Learn more about:

IEEE membership

IEEE Xplore subscriptions

3 Author(s)
Li-Yung Ho ; Dept. of Comput. Sci. & Inf. Eng., Nat. Taiwan Univ., Taipei, Taiwan ; Jan-Jan Wu ; Pangfeng Liu

MapReduce is a widely used data-parallel programming model for large-scale data analysis. The framework is shown to be scalable to thousand of computing nodes and reliable on commodity clusters. However, research has shown that there is room for performance improvement of the MapReduce framework. One of the main performance bottlenecks is caused by the all-to-all communication between mappers and reducers, which may saturate the top-of-rack switch and inflate job execution time. Reducing cross-rack communication will improve job performance. In current MapReduce implementation, the task assignment is based on the pull-model, in which cross-rack traffic is difficult to control. In contrast, the MapReduce framework allows more flexibility in assigning reducers to the computing nodes. In this paper, we investigate the reducer placement problem (RPP), which considers the placement of reducers to minimize cross-rack traffic. We devise two optimal algorithms to solve RPP and implement the algorithms in the Hadoop system. We also propose an analytical solution for this problem. Our experiment results with a set of MapReduce applications show that our optimization achieves 9% to 32%performance improvement compared with the unoptimized Hadoop.

Published in:

Cloud Computing (CLOUD), 2011 IEEE International Conference on

Date of Conference:

4-9 July 2011