Loading [MathJax]/extensions/MathMenu.js
SSD Drive Failure Prediction on Alibaba Data Center Using Machine Learning | IEEE Conference Publication | IEEE Xplore

SSD Drive Failure Prediction on Alibaba Data Center Using Machine Learning


Abstract:

Flash-based Solid-State Drives (SSDs) have become a critical storage tier in data centers and enterprise storage systems. Cloud companies are very interested in predictin...Show More

Abstract:

Flash-based Solid-State Drives (SSDs) have become a critical storage tier in data centers and enterprise storage systems. Cloud companies are very interested in predicting drive failures. Drive failure prediction enables managing drive replacement and backup data beforehand and helps planning drive purchase strategies. Solidigm and Alibaba collaborate to collect and analyze Self-Monitoring, Analysis, and Reporting Technology (SMART) data and predict SSD failures 30 days ahead of time using machine learning techniques. In this paper, we use group k-fold cross-validation to select the best parameters for machine learning models and avoid overfitting. After obtaining the prediction score of each sample from the model, a post-processing with neural network is applied on those prediction scores to get the drive-level prediction. A modified ensemble learning method is designed and implemented by majority voting on different models of Light GBM and Random Forest to further improve prediction results. This paper is the first work in both academia and the storage industry to design a drive failure prediction system for deploying in data centers by optimizing models with the highest Precision instead of the highest F1-score to minimize false positive rate. We advance to get drive failure prediction with 100% Precision and 21% Recall, enabling us to avoid the high cost of false positives.
Date of Conference: 15-18 May 2022
Date Added to IEEE Xplore: 25 May 2022
ISBN Information:

ISSN Information:

Conference Location: Dresden, Germany

Contact IEEE to Subscribe

References

References is not available for this document.