By Topic

VCCP: A transparent, coordinated checkpointing system for virtualization-based cluster computing

Sign In

Cookies must be enabled to login.After enabling cookies , please use refresh or reload or ctrl+f5 on the browser for the login options.

Formats Non-Member Member
$31 $13
Learn how you can qualify for the best price for this item!
Become an IEEE Member or Subscribe to
IEEE Xplore for exclusive pricing!
close button

puzzle piece

IEEE membership options for an individual and IEEE Xplore subscriptions for an organization offer the most affordable access to essential journal articles, conference papers, standards, eBooks, and eLearning courses.

Learn more about:

IEEE membership

IEEE Xplore subscriptions

4 Author(s)
Ong, H. ; Comput. Sci. & Math. Div., Oak Ridge Nat. Lab., Oak Ridge, TN, USA ; Saragol, N. ; Chanchio, K. ; Leangsuksun, C.

Virtual machine, which typically consists of a guest operating system (OS) and its serial applications, can be checkpointed, migrated to another cluster node, and restarted later to its previous saved state. However, to date, it is nontrivial to provide checkpoint-restart mechanisms with the same level of transparency for distributed applications running on a cluster of virtual machines. To address this particular issue, we have created the Virtual Cluster CheckPointing (VCCP) system, a novel system for transparent coordinated checkpoint-restart of virtual machines and its distributed application on commodity clusters. In this paper, we detail the design and implementation of the VCCP system. Our VCCP prototype extends the open source QEMU system with kqemu module by implementing hypervisor-based Coordinated Checkpoint-Restart protocols. To verify and validate our prototype, we measured its performance using the NAS parallel benchmark. Our experimental results indicate that VCCP generates less than 1% of additional execution overhead for non-communication intensive parallel applications. Furthermore, our correctness analysis shows that VCCP does not cause message loss or reordering, which is a necessary property to ensure correctness of checkpoint-restart mechanism. Finally, we believe that VCCP is a promising checkpoint-restart alternative for legacy applications that have implemented traditional process-level checkpoint-restart.

Published in:

Cluster Computing and Workshops, 2009. CLUSTER '09. IEEE International Conference on

Date of Conference:

Aug. 31 2009-Sept. 4 2009