By Topic

Failure Prediction Models for Proactive Fault Tolerance within Storage Systems

Sign In

Cookies must be enabled to login.After enabling cookies , please use refresh or reload or ctrl+f5 on the browser for the login options.

Formats Non-Member Member
$33 $13
Learn how you can qualify for the best price for this item!
Become an IEEE Member or Subscribe to
IEEE Xplore for exclusive pricing!
close button

puzzle piece

IEEE membership options for an individual and IEEE Xplore subscriptions for an organization offer the most affordable access to essential journal articles, conference papers, standards, eBooks, and eLearning courses.

Learn more about:

IEEE membership

IEEE Xplore subscriptions

4 Author(s)
Ben Eckart ; Tennessee Technological University, ; Xin Chen ; Xubin He ; Stephen L. Scott

The increasingly large demand for data storage has spurred on the development of systems that rely on the aggregate performance of multiple hard drives. In many of these applications, reliability and availability are of utmost importance. It is therefore necessary to closely scrutinize a complex storage system's reliability characteristics. In this paper, we use Markov models to rigorously demonstrate the effects that failure prediction has on a system's mean time to data loss (MTTDL) given a parameterized sensitivity. We devise models for a single hard drive, RAID1, and N+1 type RAID systems. We find that the normal SMART failure prediction system has little impact on the MTTDL, but striking results can be seen when the sensitivity of the predictor reaches 0.5 or more. In past research, machine learning techniques have been proposed to improve SMART, showing that sensitivity levels of 0.5 or more are possible by training on past SMART data alone. The results of our stochastic models show that even with such relatively modest predictive power, these failure prediction algorithms can drastically extend the MTTDL of a data storage system. We feel that these results underscore the importance and need for complex prediction systems when calculating impending hard drive failures.

Published in:

2008 IEEE International Symposium on Modeling, Analysis and Simulation of Computers and Telecommunication Systems

Date of Conference:

8-10 Sept. 2008