A self-monitoring analysis and reporting technology dataset of 147,496 hard disks.

IF 5.8 2区 综合性期刊 Q1 MULTIDISCIPLINARY SCIENCES
Shuting Wei, Hongzhang Yang, Zhengguang Chen, Ping Wang
{"title":"A self-monitoring analysis and reporting technology dataset of 147,496 hard disks.","authors":"Shuting Wei, Hongzhang Yang, Zhengguang Chen, Ping Wang","doi":"10.1038/s41597-025-05457-z","DOIUrl":null,"url":null,"abstract":"<p><p>In order to study hard disk failure prediction,this paper introduces SMART-Z, a dataset comprising 147,496 pieces of hard disk SMART data periodically collected by a large distributed video data center in China in the enterprise application environment from March 2017 to February 2018. There are 65 types of hard disk models, including 712 failure disks and the rest are healthy disks. To minimize business interference,data acquisition utilized predefined peak-hour exclusion lists, multi-dimensional monitoring, and an intelligent fuse strategy to effectively guarantee the stable operation. Compared to similar open source datasets, SMART-Z additionally discloses the critical value, worst value, device IP, business scenario, drive letter name and other attributes, which is helpful for researchers to track the change of hard disk capacity through time series analysis, and realize regional equipment distribution statistics by business scenario dimensions, thereby building hard disk failure prediction model. After verification, our dataset exhibits only 5.3% blank data, outperforming the 2022 Backblaze ST4000DM000 hard disk data, where the blank value accounts for 14.78% of the total data.</p>","PeriodicalId":21597,"journal":{"name":"Scientific Data","volume":"12 1","pages":"1125"},"PeriodicalIF":5.8000,"publicationDate":"2025-07-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Scientific Data","FirstCategoryId":"103","ListUrlMain":"https://doi.org/10.1038/s41597-025-05457-z","RegionNum":2,"RegionCategory":"综合性期刊","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"MULTIDISCIPLINARY SCIENCES","Score":null,"Total":0}
引用次数: 0

Abstract

In order to study hard disk failure prediction,this paper introduces SMART-Z, a dataset comprising 147,496 pieces of hard disk SMART data periodically collected by a large distributed video data center in China in the enterprise application environment from March 2017 to February 2018. There are 65 types of hard disk models, including 712 failure disks and the rest are healthy disks. To minimize business interference,data acquisition utilized predefined peak-hour exclusion lists, multi-dimensional monitoring, and an intelligent fuse strategy to effectively guarantee the stable operation. Compared to similar open source datasets, SMART-Z additionally discloses the critical value, worst value, device IP, business scenario, drive letter name and other attributes, which is helpful for researchers to track the change of hard disk capacity through time series analysis, and realize regional equipment distribution statistics by business scenario dimensions, thereby building hard disk failure prediction model. After verification, our dataset exhibits only 5.3% blank data, outperforming the 2022 Backblaze ST4000DM000 hard disk data, where the blank value accounts for 14.78% of the total data.

147,496块硬盘的自监测分析报告技术数据集。
为了研究硬盘故障预测,本文引入了SMART- z数据集,该数据集由中国某大型分布式视频数据中心于2017年3月至2018年2月在企业应用环境下定期采集的147,496条硬盘SMART数据组成。硬盘型号共有65种,其中故障硬盘712块,其余硬盘为健康硬盘。为了最大限度地减少业务干扰,数据采集采用了预定义的高峰时段排除列表、多维度监控和智能融合策略,有效保证了系统的稳定运行。与同类开源数据集相比,SMART-Z还公开了临界值、最差值、设备IP、业务场景、盘符名称等属性,有助于研究人员通过时间序列分析跟踪硬盘容量的变化,并通过业务场景维度实现区域设备分布统计,从而构建硬盘故障预测模型。经过验证,我们的数据集只有5.3%的空白数据,优于2022年Backblaze ST4000DM000硬盘数据,其中空白值占总数据的14.78%。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
Scientific Data
Scientific Data Social Sciences-Education
CiteScore
11.20
自引率
4.10%
发文量
689
审稿时长
16 weeks
期刊介绍: Scientific Data is an open-access journal focused on data, publishing descriptions of research datasets and articles on data sharing across natural sciences, medicine, engineering, and social sciences. Its goal is to enhance the sharing and reuse of scientific data, encourage broader data sharing, and acknowledge those who share their data. The journal primarily publishes Data Descriptors, which offer detailed descriptions of research datasets, including data collection methods and technical analyses validating data quality. These descriptors aim to facilitate data reuse rather than testing hypotheses or presenting new interpretations, methods, or in-depth analyses.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信