基于自适应多尺度采样和激活模式正则化的故障预测

2017 IEEE International Conference on Data Mining Workshops (ICDMW) Pub Date : 2017-11-01 DOI:10.1109/ICDMW.2017.17

Yujin Tang, Shinya Wada, K. Yoshihara

{"title":"基于自适应多尺度采样和激活模式正则化的故障预测","authors":"Yujin Tang, Shinya Wada, K. Yoshihara","doi":"10.1109/ICDMW.2017.17","DOIUrl":null,"url":null,"abstract":"We treat failure prediction in a supervised learning framework using a convolutional neural network (CNN). Due to the nature of the problem, learning a CNN model on this kind of dataset is generally associated with three primary problems: 1) negative samples (indicating a healthy system) outnumber positives (indicating system failures) by a great margin; 2) implementation design often requires chopping an original time series into sub-sequences, defining a segmentation window size with sufficient data augmentation and avoiding serious multiple-instance learning issue is non-trivial; 3) positive samples may have a common underlying cause and thus present similar features, negative samples can have various latent characteristics which can \"distract\" CNN in the learning process. While the first problem has been extensively discussed in literatures, the last two issues are less explored in the context of deep learning using CNN. We mitigate the second problem by introducing a random variable on sample scaling parameters, whose distribution's parameters are jointly learnt with CNN and leads to what we call adaptive multi-scale sampling (AMS). To address the third problem, we propose activation pattern regularization (APR) on only positive samples such that the CNN focuses on learning representations pertaining to the underlying common cause. We demonstrate the effectiveness of our proposals on a past Kaggle contest dataset that predicts seizures from EEG data. Compared to the baseline method with a CNN trained in traditional scheme, we observe significant performance improvement for both proposed methods. When combined, our model without any sophisticated hyper-parameter tuning or ensemble methods shows a near 10% relative improvement on AUROC and is able to send us to the 14th place on the contest's leaderboard while the highest rank the baseline can reach is 77th.","PeriodicalId":389183,"journal":{"name":"2017 IEEE International Conference on Data Mining Workshops (ICDMW)","volume":"31 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2017-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":"{\"title\":\"Failure Prediction with Adaptive Multi-scale Sampling and Activation Pattern Regularization\",\"authors\":\"Yujin Tang, Shinya Wada, K. Yoshihara\",\"doi\":\"10.1109/ICDMW.2017.17\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"We treat failure prediction in a supervised learning framework using a convolutional neural network (CNN). Due to the nature of the problem, learning a CNN model on this kind of dataset is generally associated with three primary problems: 1) negative samples (indicating a healthy system) outnumber positives (indicating system failures) by a great margin; 2) implementation design often requires chopping an original time series into sub-sequences, defining a segmentation window size with sufficient data augmentation and avoiding serious multiple-instance learning issue is non-trivial; 3) positive samples may have a common underlying cause and thus present similar features, negative samples can have various latent characteristics which can \\\"distract\\\" CNN in the learning process. While the first problem has been extensively discussed in literatures, the last two issues are less explored in the context of deep learning using CNN. We mitigate the second problem by introducing a random variable on sample scaling parameters, whose distribution's parameters are jointly learnt with CNN and leads to what we call adaptive multi-scale sampling (AMS). To address the third problem, we propose activation pattern regularization (APR) on only positive samples such that the CNN focuses on learning representations pertaining to the underlying common cause. We demonstrate the effectiveness of our proposals on a past Kaggle contest dataset that predicts seizures from EEG data. Compared to the baseline method with a CNN trained in traditional scheme, we observe significant performance improvement for both proposed methods. When combined, our model without any sophisticated hyper-parameter tuning or ensemble methods shows a near 10% relative improvement on AUROC and is able to send us to the 14th place on the contest's leaderboard while the highest rank the baseline can reach is 77th.\",\"PeriodicalId\":389183,\"journal\":{\"name\":\"2017 IEEE International Conference on Data Mining Workshops (ICDMW)\",\"volume\":\"31 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2017-11-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"2\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2017 IEEE International Conference on Data Mining Workshops (ICDMW)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICDMW.2017.17\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2017 IEEE International Conference on Data Mining Workshops (ICDMW)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICDMW.2017.17","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 2

摘要

我们使用卷积神经网络(CNN)在监督学习框架中处理故障预测。由于问题的性质，在这类数据集上学习CNN模型通常与三个主要问题相关:1)负样本(表明系统健康)远远超过正样本(表明系统故障);2)实现设计通常需要将原始时间序列分割成子序列，定义具有足够数据增强的分割窗口大小，并避免严重的多实例学习问题。3)正样本可能有一个共同的潜在原因，因此呈现出相似的特征，而负样本可能有各种潜在特征，在学习过程中“分散”CNN的注意力。虽然第一个问题在文献中得到了广泛的讨论，但后两个问题在使用CNN的深度学习背景下却很少被探讨。我们通过在样本尺度参数上引入一个随机变量来缓解第二个问题，该随机变量的分布参数是与CNN共同学习的，并导致我们所谓的自适应多尺度采样(AMS)。为了解决第三个问题，我们提出仅对正样本进行激活模式正则化(APR)，使CNN专注于学习与潜在共同原因相关的表征。我们在过去的Kaggle竞赛数据集上证明了我们的建议的有效性，该数据集可以从EEG数据中预测癫痫发作。与传统方案训练的CNN基线方法相比，我们观察到两种方法的性能都有显著提高。结合起来，我们的模型没有任何复杂的超参数调整或集成方法，在AUROC上显示出近10%的相对改进，并且能够将我们送到比赛排行榜上的第14位，而基线可以达到的最高排名是第77位。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Failure Prediction with Adaptive Multi-scale Sampling and Activation Pattern Regularization

We treat failure prediction in a supervised learning framework using a convolutional neural network (CNN). Due to the nature of the problem, learning a CNN model on this kind of dataset is generally associated with three primary problems: 1) negative samples (indicating a healthy system) outnumber positives (indicating system failures) by a great margin; 2) implementation design often requires chopping an original time series into sub-sequences, defining a segmentation window size with sufficient data augmentation and avoiding serious multiple-instance learning issue is non-trivial; 3) positive samples may have a common underlying cause and thus present similar features, negative samples can have various latent characteristics which can "distract" CNN in the learning process. While the first problem has been extensively discussed in literatures, the last two issues are less explored in the context of deep learning using CNN. We mitigate the second problem by introducing a random variable on sample scaling parameters, whose distribution's parameters are jointly learnt with CNN and leads to what we call adaptive multi-scale sampling (AMS). To address the third problem, we propose activation pattern regularization (APR) on only positive samples such that the CNN focuses on learning representations pertaining to the underlying common cause. We demonstrate the effectiveness of our proposals on a past Kaggle contest dataset that predicts seizures from EEG data. Compared to the baseline method with a CNN trained in traditional scheme, we observe significant performance improvement for both proposed methods. When combined, our model without any sophisticated hyper-parameter tuning or ensemble methods shows a near 10% relative improvement on AUROC and is able to send us to the 14th place on the contest's leaderboard while the highest rank the baseline can reach is 77th.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2017 IEEE International Conference on Data Mining Workshops (ICDMW)

自引率

0.00%

发文量