By Topic

Lazy garbage collection of recovery state for fault-tolerant distributed shared memory

Sign In

Cookies must be enabled to login.After enabling cookies , please use refresh or reload or ctrl+f5 on the browser for the login options.

Formats Non-Member Member
$33 $13
Learn how you can qualify for the best price for this item!
Become an IEEE Member or Subscribe to
IEEE Xplore for exclusive pricing!
close button

puzzle piece

IEEE membership options for an individual and IEEE Xplore subscriptions for an organization offer the most affordable access to essential journal articles, conference papers, standards, eBooks, and eLearning courses.

Learn more about:

IEEE membership

IEEE Xplore subscriptions

3 Author(s)
F. Sultan ; Dept. of Comput. Sci., Rutgers Univ., Piscataway, NJ, USA ; T. D. Nguyen ; L. Iftode

We address the problem of garbage collection in a single-failure fault-tolerant home-based lazy release consistency (HLRC) distributed shared-memory (DSM) system based on independent checkpointing and logging. Our solution uses laziness in garbage collection and exploits consistency constraints of the HLRC memory model for low overhead and scalability. We prove safe bounds on the state that must be retained in the system to guarantee correct recovery after a failure. We devise two algorithms for garbage collection of checkpoints and logs, checkpoint garbage collection (CGC), and lazy log trimming (LLT). The proposed approach targets large-scale distributed shared-memory computing on local-area clusters of computers. The challenge lies in controlling the size of the logs and the number of checkpoints without global synchronization while tolerating transient disruptions in communication. Evaluation results for real applications show that it effectively bounds the number of past checkpoints to be retained and the size of the logs in stable storage

Published in:

IEEE Transactions on Parallel and Distributed Systems  (Volume:13 ,  Issue: 10 )