A numerical approach for computing optimal dynamic checkpointing strategies for general rollback and recovery systems is presented. The system is modeled as a Markov renewal decision process. General failure distributions, random checkpointing durations, and reprocessing-dependent recovery times are allowed. The aim is to find a dynamic decision rule to maximize the average system availability over an infinite time horizon. A computational approach to approximate such a rule is proposed. This approach is based on value-iteration stochastic dynamic programming with spline or finite-element approximation of the value and policy functions. Numerical illustrations are provided
Published in:
Computers, IEEE Transactions on
(Volume:37
,
Issue:
4
)
Date of Publication: Apr 1988