Shuting Wei, Hongzhang Yang, Zhengguang Chen, Ping Wang
{"title":"A self-monitoring analysis and reporting technology dataset of 147,496 hard disks.","authors":"Shuting Wei, Hongzhang Yang, Zhengguang Chen, Ping Wang","doi":"10.1038/s41597-025-05457-z","DOIUrl":null,"url":null,"abstract":"<p><p>In order to study hard disk failure prediction,this paper introduces SMART-Z, a dataset comprising 147,496 pieces of hard disk SMART data periodically collected by a large distributed video data center in China in the enterprise application environment from March 2017 to February 2018. There are 65 types of hard disk models, including 712 failure disks and the rest are healthy disks. To minimize business interference,data acquisition utilized predefined peak-hour exclusion lists, multi-dimensional monitoring, and an intelligent fuse strategy to effectively guarantee the stable operation. Compared to similar open source datasets, SMART-Z additionally discloses the critical value, worst value, device IP, business scenario, drive letter name and other attributes, which is helpful for researchers to track the change of hard disk capacity through time series analysis, and realize regional equipment distribution statistics by business scenario dimensions, thereby building hard disk failure prediction model. After verification, our dataset exhibits only 5.3% blank data, outperforming the 2022 Backblaze ST4000DM000 hard disk data, where the blank value accounts for 14.78% of the total data.</p>","PeriodicalId":21597,"journal":{"name":"Scientific Data","volume":"12 1","pages":"1125"},"PeriodicalIF":5.8000,"publicationDate":"2025-07-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Scientific Data","FirstCategoryId":"103","ListUrlMain":"https://doi.org/10.1038/s41597-025-05457-z","RegionNum":2,"RegionCategory":"综合性期刊","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"MULTIDISCIPLINARY SCIENCES","Score":null,"Total":0}
引用次数: 0
Abstract
In order to study hard disk failure prediction,this paper introduces SMART-Z, a dataset comprising 147,496 pieces of hard disk SMART data periodically collected by a large distributed video data center in China in the enterprise application environment from March 2017 to February 2018. There are 65 types of hard disk models, including 712 failure disks and the rest are healthy disks. To minimize business interference,data acquisition utilized predefined peak-hour exclusion lists, multi-dimensional monitoring, and an intelligent fuse strategy to effectively guarantee the stable operation. Compared to similar open source datasets, SMART-Z additionally discloses the critical value, worst value, device IP, business scenario, drive letter name and other attributes, which is helpful for researchers to track the change of hard disk capacity through time series analysis, and realize regional equipment distribution statistics by business scenario dimensions, thereby building hard disk failure prediction model. After verification, our dataset exhibits only 5.3% blank data, outperforming the 2022 Backblaze ST4000DM000 hard disk data, where the blank value accounts for 14.78% of the total data.
期刊介绍:
Scientific Data is an open-access journal focused on data, publishing descriptions of research datasets and articles on data sharing across natural sciences, medicine, engineering, and social sciences. Its goal is to enhance the sharing and reuse of scientific data, encourage broader data sharing, and acknowledge those who share their data.
The journal primarily publishes Data Descriptors, which offer detailed descriptions of research datasets, including data collection methods and technical analyses validating data quality. These descriptors aim to facilitate data reuse rather than testing hypotheses or presenting new interpretations, methods, or in-depth analyses.