Lanlan Pan, Zhaojun Gu, Yitong Ren, Chunbo Liu, Zhi Wang
{"title":"基于Venn-Abers预测因子的系统日志异常检测方法","authors":"Lanlan Pan, Zhaojun Gu, Yitong Ren, Chunbo Liu, Zhi Wang","doi":"10.1109/DSC50466.2020.00063","DOIUrl":null,"url":null,"abstract":"System logs can record the system status and important events during system operation in detail. Detecting anomalies through the system log is a common method for modern large-scale distributed systems. While using machine learning algorithms to system log anomaly detection, the output of threshold-based classification models are only normally or abnormally simple predictions, which lacks probability of estimating whether the prediction results are correct. In this paper, a statistical learning algorithm Venn-Abers predictor is used to evaluate the confidence of prediction results in the field of system log anomaly detection. It is able to calculate the label probability distribution for a set of samples, and provides a quality assessment of predictive labels with a degree of certainty. Two Venn-Abers predictors were implemented based on logistic regression and support vector machine. Then, experiments are carried out on the log data set of the distributed me management system HDFS. Besides, two Venn-Abers predictors and two underlying algorithms are compared in terms of log anomaly detection accuracy and validity. Compared with underlying machine learning algorithms, the Venn-Abers predictor based on support vector machine can achieve better results. It reduces the false positive rate from 12% to 3%, and improve the recall rate from 81% to 94%, besides, the loss value can be reduced to 0.04. Experimental results show that Venn-Abers is a flexible tool that can make accurate and valid probability predictions in the field of system log anomaly detection.","PeriodicalId":423182,"journal":{"name":"2020 IEEE Fifth International Conference on Data Science in Cyberspace (DSC)","volume":"48 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"4","resultStr":"{\"title\":\"An Anomaly Detection Method for System Logs Using Venn-Abers Predictors\",\"authors\":\"Lanlan Pan, Zhaojun Gu, Yitong Ren, Chunbo Liu, Zhi Wang\",\"doi\":\"10.1109/DSC50466.2020.00063\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"System logs can record the system status and important events during system operation in detail. Detecting anomalies through the system log is a common method for modern large-scale distributed systems. While using machine learning algorithms to system log anomaly detection, the output of threshold-based classification models are only normally or abnormally simple predictions, which lacks probability of estimating whether the prediction results are correct. In this paper, a statistical learning algorithm Venn-Abers predictor is used to evaluate the confidence of prediction results in the field of system log anomaly detection. It is able to calculate the label probability distribution for a set of samples, and provides a quality assessment of predictive labels with a degree of certainty. Two Venn-Abers predictors were implemented based on logistic regression and support vector machine. Then, experiments are carried out on the log data set of the distributed me management system HDFS. Besides, two Venn-Abers predictors and two underlying algorithms are compared in terms of log anomaly detection accuracy and validity. Compared with underlying machine learning algorithms, the Venn-Abers predictor based on support vector machine can achieve better results. It reduces the false positive rate from 12% to 3%, and improve the recall rate from 81% to 94%, besides, the loss value can be reduced to 0.04. Experimental results show that Venn-Abers is a flexible tool that can make accurate and valid probability predictions in the field of system log anomaly detection.\",\"PeriodicalId\":423182,\"journal\":{\"name\":\"2020 IEEE Fifth International Conference on Data Science in Cyberspace (DSC)\",\"volume\":\"48 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2020-07-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"4\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2020 IEEE Fifth International Conference on Data Science in Cyberspace (DSC)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/DSC50466.2020.00063\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2020 IEEE Fifth International Conference on Data Science in Cyberspace (DSC)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/DSC50466.2020.00063","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
An Anomaly Detection Method for System Logs Using Venn-Abers Predictors
System logs can record the system status and important events during system operation in detail. Detecting anomalies through the system log is a common method for modern large-scale distributed systems. While using machine learning algorithms to system log anomaly detection, the output of threshold-based classification models are only normally or abnormally simple predictions, which lacks probability of estimating whether the prediction results are correct. In this paper, a statistical learning algorithm Venn-Abers predictor is used to evaluate the confidence of prediction results in the field of system log anomaly detection. It is able to calculate the label probability distribution for a set of samples, and provides a quality assessment of predictive labels with a degree of certainty. Two Venn-Abers predictors were implemented based on logistic regression and support vector machine. Then, experiments are carried out on the log data set of the distributed me management system HDFS. Besides, two Venn-Abers predictors and two underlying algorithms are compared in terms of log anomaly detection accuracy and validity. Compared with underlying machine learning algorithms, the Venn-Abers predictor based on support vector machine can achieve better results. It reduces the false positive rate from 12% to 3%, and improve the recall rate from 81% to 94%, besides, the loss value can be reduced to 0.04. Experimental results show that Venn-Abers is a flexible tool that can make accurate and valid probability predictions in the field of system log anomaly detection.