Examining the impact of critical attributes on hard drive failure times: Multi-state models for left-truncated and right-censored semi-competing risks data
IF 1.3 4区 数学Q3 MATHEMATICS, INTERDISCIPLINARY APPLICATIONS
Jordan L. Oakley, Matthew Forshaw, Pete Philipson, Kevin J. Wilson
{"title":"Examining the impact of critical attributes on hard drive failure times: Multi-state models for left-truncated and right-censored semi-competing risks data","authors":"Jordan L. Oakley, Matthew Forshaw, Pete Philipson, Kevin J. Wilson","doi":"10.1002/asmb.2829","DOIUrl":null,"url":null,"abstract":"<p>The ability to predict failures in hard disk drives (HDDs) is a major objective of HDD manufacturers since avoiding unexpected failures may prevent data loss, improve service reliability, and reduce data center downtime. Most HDDs are equipped with a threshold-based monitoring system named self-monitoring, analysis and reporting technology (SMART). The system collects several performance metrics, called SMART attributes, and detects anomalies that may indicate incipient failures. SMART works as a nascent failure detection method and does not estimate the HDDs' remaining useful life. We define critical attributes and critical states for hard drives using SMART attributes and fit multi-state models to the resulting semi-competing risks data. The multi-state models provide a coherent and novel way to model the failure time of a hard drive and allow us to examine the impact of critical attributes on the failure time of a hard drive. We derive dynamic predictions of conditional survival probabilities, which are adaptive to the state of the drive. Using a dataset of HDDs equipped with SMART, we find that drives are more likely to fail after entering critical states. We evaluate the predictive accuracy of the proposed models with a case study of HDDs equipped with SMART, using the time-dependent area under the receiver operating characteristic curve (AUC) and the expected prediction error (PE). The results suggest that accounting for changes in the critical attributes improves the accuracy of dynamic predictions.</p>","PeriodicalId":55495,"journal":{"name":"Applied Stochastic Models in Business and Industry","volume":null,"pages":null},"PeriodicalIF":1.3000,"publicationDate":"2023-12-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1002/asmb.2829","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Applied Stochastic Models in Business and Industry","FirstCategoryId":"100","ListUrlMain":"https://onlinelibrary.wiley.com/doi/10.1002/asmb.2829","RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"MATHEMATICS, INTERDISCIPLINARY APPLICATIONS","Score":null,"Total":0}
引用次数: 0
Abstract
The ability to predict failures in hard disk drives (HDDs) is a major objective of HDD manufacturers since avoiding unexpected failures may prevent data loss, improve service reliability, and reduce data center downtime. Most HDDs are equipped with a threshold-based monitoring system named self-monitoring, analysis and reporting technology (SMART). The system collects several performance metrics, called SMART attributes, and detects anomalies that may indicate incipient failures. SMART works as a nascent failure detection method and does not estimate the HDDs' remaining useful life. We define critical attributes and critical states for hard drives using SMART attributes and fit multi-state models to the resulting semi-competing risks data. The multi-state models provide a coherent and novel way to model the failure time of a hard drive and allow us to examine the impact of critical attributes on the failure time of a hard drive. We derive dynamic predictions of conditional survival probabilities, which are adaptive to the state of the drive. Using a dataset of HDDs equipped with SMART, we find that drives are more likely to fail after entering critical states. We evaluate the predictive accuracy of the proposed models with a case study of HDDs equipped with SMART, using the time-dependent area under the receiver operating characteristic curve (AUC) and the expected prediction error (PE). The results suggest that accounting for changes in the critical attributes improves the accuracy of dynamic predictions.
期刊介绍:
ASMBI - Applied Stochastic Models in Business and Industry (formerly Applied Stochastic Models and Data Analysis) was first published in 1985, publishing contributions in the interface between stochastic modelling, data analysis and their applications in business, finance, insurance, management and production. In 2007 ASMBI became the official journal of the International Society for Business and Industrial Statistics (www.isbis.org). The main objective is to publish papers, both technical and practical, presenting new results which solve real-life problems or have great potential in doing so. Mathematical rigour, innovative stochastic modelling and sound applications are the key ingredients of papers to be published, after a very selective review process.
The journal is very open to new ideas, like Data Science and Big Data stemming from problems in business and industry or uncertainty quantification in engineering, as well as more traditional ones, like reliability, quality control, design of experiments, managerial processes, supply chains and inventories, insurance, econometrics, financial modelling (provided the papers are related to real problems). The journal is interested also in papers addressing the effects of business and industrial decisions on the environment, healthcare, social life. State-of-the art computational methods are very welcome as well, when combined with sound applications and innovative models.