By Topic

Energy Efficient Fault Tolerance for High Performance Computing (HPC) in the Cloud

Sign In

Cookies must be enabled to login.After enabling cookies , please use refresh or reload or ctrl+f5 on the browser for the login options.

Formats Non-Member Member
$33 $13
Learn how you can qualify for the best price for this item!
Become an IEEE Member or Subscribe to
IEEE Xplore for exclusive pricing!
close button

puzzle piece

IEEE membership options for an individual and IEEE Xplore subscriptions for an organization offer the most affordable access to essential journal articles, conference papers, standards, eBooks, and eLearning courses.

Learn more about:

IEEE membership

IEEE Xplore subscriptions

5 Author(s)
Egwutuoha, I.P. ; Sch. of Electr. & Inf. Eng., Univ. of Sydney, Sydney, NSW, Australia ; Shiping Chen ; Levy, D. ; Selic, B.
more authors

With cloud computing, a large number of Virtual Machines (VMs) can be provisioned to form high performance computing (HPC) to run computation-intensive applications using the Hardware as a Service (HaaS) model. Fault Tolerance (FT) for HPC in the cloud is increasingly a challenging issue, because any fault during the execution would result in re-running the application, which will cost time, money and energy. There has been a significant increase in energy consumption of HPC systems in cloud as a result of rerunning application and fault tolerance (e.g., redundant computing). In this paper we present energy efficient fault tolerance for HPC in the cloud. We develop a generic FT algorithm for HPC systems in the cloud. Our algorithm uses proactive processlevel migration approach, however it does not rely on a spare node or redundant computing prior to prediction of a failure. Our experimental results obtained from a real cloud execution environment show that the energy utilization for HPC in the cloud while providing fault tolerance can be reduced by as much as 30%.

Published in:

Cloud Computing (CLOUD), 2013 IEEE Sixth International Conference on

Date of Conference:

June 28 2013-July 3 2013