Abstract:
Online service systems have been increasingly popular and important nowadays. Reducing the MTTR (Mean Time to Restore) of a service remains one of the most important step...Show MoreMetadata
Abstract:
Online service systems have been increasingly popular and important nowadays. Reducing the MTTR (Mean Time to Restore) of a service remains one of the most important steps to assure the user-perceived availability of the service. To reduce the MTTR, a common practice is to restore the service by identifying and applying an appropriate healing action. In this paper, we present an automated mining-based approach for suggesting an appropriate healing action for a given new issue. Our approach suggests an appropriate healing action by adapting healing actions from the retrieved similar historical issues. We have applied our approach to a real-world and large-scale product online service. The studies on 243 real issues of the service show that our approach can effectively suggest appropriate healing actions (with 87% accuracy) to reduce the MTTR of the service. In addition, according to issue characteristics, we further study and categorize issues where automatic healing suggestion faces difficulties.
Published in: 2014 44th Annual IEEE/IFIP International Conference on Dependable Systems and Networks
Date of Conference: 23-26 June 2014
Date Added to IEEE Xplore: 22 September 2014
Electronic ISBN:978-1-4799-2233-8