Resilience is an important aspect of computing systems. Previous work on resilience has often focused on the design and architectural aspects of such systems, and not on the quantification of resilience. In addition, quantification is often restricted to a limited portion of the system. In networked systems, where multiple heterogeneous components interact in a complex manner, resilience quantification becomes a nontrivial problem. This paper proposes a model for quantifying resilience on the basis of the interdependencies of services and their adaptation. It combines performance and adaptability metrics to compute resilience of individual services that are then applied to a Markov network that computes the overall system resilience. The adaptation metric, here called adaptivity, computes how often the service adapts and evaluates the efficiency of such adaptations in terms of performance improvement. This paper also presents an evaluation that considers critical infrastructure systems.
Note: The Institute of Electrical and Electronics Engineers, Incorporated is distributing this Article with permission of the International Business Machines Corporation (IBM) who is the exclusive owner. The recipient of this Article may not assign, sublicense, lease, rent or otherwise transfer, reproduce, prepare derivative works, publicly display or perform, or distribute the Article.