{"title":"Failure Prediction Based on Anomaly Detection for Complex Core Routers","authors":"Shi Jin, Zhaobo Zhang, K. Chakrabarty, Xinli Gu","doi":"10.1145/3240765.3243476","DOIUrl":null,"url":null,"abstract":"Data-driven prognostic health management is essential to ensure high reliability and rapid error recovery in commercial core router systems. The effectiveness of prognostic health management depends on whether failures can be accurately predicted with sufficient lead time. This paper describes how time-series analysis and machine-learning techniques can be used to detect anomalies and predict failures in complex core router systems. First both a feature-categorization-based hybrid method and a changepoint-based method have been developed to detect anomalies in time-varying features with different statistical characteristics. Next, a SVM-based failure predictor is developed to predict both categories and lead time of system failures from collected anomalies. A comprehensive set of experimental results is presented for data collected during 30 days of field operation from over 20 core routers deployed by customers of a major telecom company.","PeriodicalId":413037,"journal":{"name":"2018 IEEE/ACM International Conference on Computer-Aided Design (ICCAD)","volume":"132 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2018-11-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"8","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2018 IEEE/ACM International Conference on Computer-Aided Design (ICCAD)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3240765.3243476","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 8
Abstract
Data-driven prognostic health management is essential to ensure high reliability and rapid error recovery in commercial core router systems. The effectiveness of prognostic health management depends on whether failures can be accurately predicted with sufficient lead time. This paper describes how time-series analysis and machine-learning techniques can be used to detect anomalies and predict failures in complex core router systems. First both a feature-categorization-based hybrid method and a changepoint-based method have been developed to detect anomalies in time-varying features with different statistical characteristics. Next, a SVM-based failure predictor is developed to predict both categories and lead time of system failures from collected anomalies. A comprehensive set of experimental results is presented for data collected during 30 days of field operation from over 20 core routers deployed by customers of a major telecom company.