Skip to Main Content
The increasingly large demand for data storage has spurred on the development of systems that rely on the aggregate performance of multiple hard drives. In many of these applications, reliability and availability are of utmost importance. It is therefore necessary to closely scrutinize a complex storage system's reliability characteristics. In this paper, we use Markov models to rigorously demonstrate the effects that failure prediction has on a system's mean time to data loss (MTTDL) given a parameterized sensitivity. We devise models for a single hard drive, RAID1, and N+1 type RAID systems. We find that the normal SMART failure prediction system has little impact on the MTTDL, but striking results can be seen when the sensitivity of the predictor reaches 0.5 or more. In past research, machine learning techniques have been proposed to improve SMART, showing that sensitivity levels of 0.5 or more are possible by training on past SMART data alone. The results of our stochastic models show that even with such relatively modest predictive power, these failure prediction algorithms can drastically extend the MTTDL of a data storage system. We feel that these results underscore the importance and need for complex prediction systems when calculating impending hard drive failures.
Date of Conference: 8-10 Sept. 2008