Minimizing Overheads of Checkpoints in Distributed Stream Processing Systems | IEEE Conference Publication | IEEE Xplore

Minimizing Overheads of Checkpoints in Distributed Stream Processing Systems


Abstract:

Failure occurrence in large-scale systems is inevitable, which makes the resilience a key challenge for modern systems. Checkpoints with rollback recovery is a well-known...Show More

Abstract:

Failure occurrence in large-scale systems is inevitable, which makes the resilience a key challenge for modern systems. Checkpoints with rollback recovery is a well-known approach to provide fault tolerance in distributed systems. The checkpoint based fault tolerance approach periodically persists the application state to reliable storage, which serves as a recovery point in case of failure. These periodic checkpoints are not inline with the failure rate of the systems as many studies conclude that failure occurrence is not periodic. The optimal size of checkpoint interval is a crucial decision, which directly determines the checkpoint overheads. To minimize the checkpoint overheads, we propose to reduce the number of checkpoints during the application execution. We suggest reducing the number of checkpoints by successively increasing the checkpoint intervals. We consider the failure probability of the underlying infrastructure and iteratively increase the checkpoint intervals. The proposed checkpoint approach tailors the checkpoint initializing based on the failure probability. If failure probability is low, it increases the checkpoint interval, and eventually reduces the total number of checkpoints triggered during application timespan. Reducing the total number of checkpoints during application execution results in decreasing the checkpoint overheads. The experiment results show that the proposed checkpoint policy considerably reduces the checkpoint overheads as compared to periodic checkpoints.
Date of Conference: 22-24 October 2018
Date Added to IEEE Xplore: 29 November 2018
ISBN Information:
Conference Location: Tokyo, Japan

Contact IEEE to Subscribe

References

References is not available for this document.