In this paper, we have proposed a simple and efficient approach for recovery in distributed computing environment. The check pointing scheme used in the work ensures that after the system recovers from failures, all processes can restart from their respective recent checkpoints; thus avoiding any domino effect. That is, the recent check points always form a consistent recovery line. The recovery scheme deals with both orphan and lost messages. Therefore, correctness of computation of the underlying application program is guaranteed.
Published in:
Industrial Informatics (INDIN), 2012 10th IEEE International Conference on
Date of Conference: 25-27 July 2012