Improving disk failure detection accuracy via data augmentation

2022 IEEE/ACM 30th International Symposium on Quality of Service (IWQoS) Pub Date : 2022-06-10 DOI:10.1109/IWQoS54832.2022.9812864

Wang Wang, Xuehai Tang, Biyu Zhou, Wenjie Xiao, Jizhong Han, Songlin Hu

{"title":"Improving disk failure detection accuracy via data augmentation","authors":"Wang Wang, Xuehai Tang, Biyu Zhou, Wenjie Xiao, Jizhong Han, Songlin Hu","doi":"10.1109/IWQoS54832.2022.9812864","DOIUrl":null,"url":null,"abstract":"Frequently happening of disk failures seriously affects the dependability and service quality of cloud data centers. Recently, machine learning (ML) based methods are popularly adopted to proactively predict forthcoming disk failures via supervised learning. However, the high imbalance of failure samples and healthy samples is a huge obstacle for existing detection methods to establish high performance detection model. This paper presents a data augmentation method MSGMD, which can efficiently generate high quality failure samples to alleviate the data imbalance of the training set, so as to effectively improve the performance of any supervised failure detection models. First, MSGMD converts failure samples (multivariate time series) into multiple univariate time series via decomposing the spatial relations among features. Then it learns the temporal correlation of each feature via a policy-based reinforcement learning model trained in an adversarial way. After that, it generates failure samples by combining feature series sampled from learned distribution. Finally, it filters out low quality generated samples with a confidence-based method. Experimental results on real-world datasets show that, through data augmentation, MSGMD can improve the FDR and F1-Score of the state-of-the-art disk failure detection model by 31.59% and 30.74% respectively on average.","PeriodicalId":353365,"journal":{"name":"2022 IEEE/ACM 30th International Symposium on Quality of Service (IWQoS)","volume":"10 23 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-06-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 IEEE/ACM 30th International Symposium on Quality of Service (IWQoS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/IWQoS54832.2022.9812864","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

Frequently happening of disk failures seriously affects the dependability and service quality of cloud data centers. Recently, machine learning (ML) based methods are popularly adopted to proactively predict forthcoming disk failures via supervised learning. However, the high imbalance of failure samples and healthy samples is a huge obstacle for existing detection methods to establish high performance detection model. This paper presents a data augmentation method MSGMD, which can efficiently generate high quality failure samples to alleviate the data imbalance of the training set, so as to effectively improve the performance of any supervised failure detection models. First, MSGMD converts failure samples (multivariate time series) into multiple univariate time series via decomposing the spatial relations among features. Then it learns the temporal correlation of each feature via a policy-based reinforcement learning model trained in an adversarial way. After that, it generates failure samples by combining feature series sampled from learned distribution. Finally, it filters out low quality generated samples with a confidence-based method. Experimental results on real-world datasets show that, through data augmentation, MSGMD can improve the FDR and F1-Score of the state-of-the-art disk failure detection model by 31.59% and 30.74% respectively on average.

查看原文本刊更多论文

通过数据增强提高磁盘故障检测的准确性

硬盘故障的频繁发生严重影响云数据中心的可靠性和服务质量。最近，基于机器学习(ML)的方法被广泛采用，通过监督学习来主动预测即将发生的磁盘故障。然而，失效样本与健康样本的高度不平衡是现有检测方法建立高性能检测模型的巨大障碍。本文提出了一种数据增强方法MSGMD，该方法可以有效地生成高质量的故障样本，以缓解训练集的数据不平衡，从而有效地提高任何监督故障检测模型的性能。首先，MSGMD通过分解特征间的空间关系，将故障样本(多元时间序列)转化为多个单变量时间序列。然后，它通过一个以对抗方式训练的基于策略的强化学习模型来学习每个特征的时间相关性。然后结合从学习分布中采样的特征序列生成故障样本。最后，用基于置信度的方法过滤掉低质量的生成样本。在真实数据集上的实验结果表明，通过数据增强，MSGMD可使最先进的磁盘故障检测模型的FDR和F1-Score平均分别提高31.59%和30.74%。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2022 IEEE/ACM 30th International Symposium on Quality of Service (IWQoS)

自引率

0.00%

发文量