In a distributed system, identifying consistent checkpoints is essential for error recovery and debugging. We design an efficient incremental algorithm capable of identifying all the consistent and removable checkpoints each time a new checkpoint is reported. By doing so, the required memory space can be minimized by removing those removables. While minimizing the memory space, the algorithm requires only O(p2M) time in total, where p is the number of processes and M is the number of checkpoints
Published in:
Parallel and Distributed Systems, 1998. Proceedings. 1998 International Conference on
Date of Conference: 14-16 Dec 1998