Fan Xu, Shujie Han, P. Lee, Yi Liu, Cheng He, Jiongzhou Liu
{"title":"General Feature Selection for Failure Prediction in Large-scale SSD Deployment","authors":"Fan Xu, Shujie Han, P. Lee, Yi Liu, Cheng He, Jiongzhou Liu","doi":"10.1109/DSN48987.2021.00039","DOIUrl":null,"url":null,"abstract":"Solid-state drive (SSD) failures are likely to cause system-level failures leading to downtime, enabling SSD failure prediction to be critical to large-scale SSD deployment. Existing SSD failure prediction studies are mostly based on customized SSDs with proprietary monitoring metrics, which are difficult to reproduce. To support general SSD failure prediction of different drive models and vendors, this paper proposes Wear-out-updating Ensemble Feature Ranking (WEFR) to select the SMART attributes as learning features in an automated and robust manner. WEFR combines different feature ranking results and automatically generates the final feature selection based on the complexity measures and the change point detection of wear-out degrees. We evaluate our approach using a dataset of nearly 500K working SSDs at Alibaba. Our results show that the proposed approach is effective and outperforms related approaches. We have successfully applied the proposed approach to improve the reliability of cloud storage systems in production SSD-based data centers. We release our dataset for public use.","PeriodicalId":222512,"journal":{"name":"2021 51st Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN)","volume":"31 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"11","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 51st Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/DSN48987.2021.00039","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 11
Abstract
Solid-state drive (SSD) failures are likely to cause system-level failures leading to downtime, enabling SSD failure prediction to be critical to large-scale SSD deployment. Existing SSD failure prediction studies are mostly based on customized SSDs with proprietary monitoring metrics, which are difficult to reproduce. To support general SSD failure prediction of different drive models and vendors, this paper proposes Wear-out-updating Ensemble Feature Ranking (WEFR) to select the SMART attributes as learning features in an automated and robust manner. WEFR combines different feature ranking results and automatically generates the final feature selection based on the complexity measures and the change point detection of wear-out degrees. We evaluate our approach using a dataset of nearly 500K working SSDs at Alibaba. Our results show that the proposed approach is effective and outperforms related approaches. We have successfully applied the proposed approach to improve the reliability of cloud storage systems in production SSD-based data centers. We release our dataset for public use.