By Topic

Improving network performance through task duplication for parallel applications on clusters

Sign In

Cookies must be enabled to login.After enabling cookies , please use refresh or reload or ctrl+f5 on the browser for the login options.

Formats Non-Member Member
$31 $13
Learn how you can qualify for the best price for this item!
Become an IEEE Member or Subscribe to
IEEE Xplore for exclusive pricing!
close button

puzzle piece

IEEE membership options for an individual and IEEE Xplore subscriptions for an organization offer the most affordable access to essential journal articles, conference papers, standards, eBooks, and eLearning courses.

Learn more about:

IEEE membership

IEEE Xplore subscriptions

1 Author(s)
Xiao Qin ; Dept. of Comput. Sci., New Mexico Inst. of Min. & Technol., Socorro, NM, USA

While data replication is widely used in clusters to provide fault tolerance, it can heavily stress communication networks and degrade overall performance of parallel applications. The performance degradation is particularly unacceptable with disk-write-intensive applications. As a result, data duplication management for parallel applications running on clusters is a significant and urgent challenge. This paper presents the design, implementation, and evaluation of a network-aware task duplication management system, or TUFF, where redundant data can be regenerated by corresponding duplicate tasks rather than directly replicating through networks. In addition, TUFF is capable of improving availability performance of parallel applications, because TUFF allows two replicas of each I/O-intensive task to be executed on two different nodes. We have implemented and evaluated TUFF using extensive simulations under a diverse set of workload conditions. Experimental results show that TUFF improves the overall performance of parallel applications running on clusters by efficiently reducing network resource consumption.

Published in:

Performance, Computing, and Communications Conference, 2005. IPCCC 2005. 24th IEEE International

Date of Conference:

7-9 April 2005