{"title":"MIR_MAD:一种高效的动态数据流异常在线检测方法","authors":"Chang How Tan, V. C. Lee, Mahsa Salehi","doi":"10.1109/ICDMW51313.2020.00065","DOIUrl":null,"url":null,"abstract":"Anomaly detection in a dynamic data stream is a challenging task. The endless bound and high arriving rate of data prohibits anomaly detection models to store all observations in memory for processing. In addition, the dynamically moving properties of the data stream exhibit concept drift. While recent studies focus on feature extraction for anomaly detection, majority of them assume data stream are static ignoring the possibility of concept drift occurring. Anomaly detection models must operate efficiently in order to deal with high volume and velocity data, that is to have low complexity and to learn incrementally from each arriving observation. Incremental learning allows the model to adapt to concept drift. In cases where drifting rate is higher than adaptation rate, the capability to detect concept drift and retraining a new model is much preferable to minimize the performance losses. In this paper, we propose MIR_MAD, an approach based on multiple incremental robust Mahalanobis estimators that is efficient, learns incrementally and has the capability to detect concept drift. MIR_MAD is fast, can be initialized with small amount of data, and is able to estimate the drift location on the data stream. Our empirical results show that MIR_MAD achieves state-of-the-art performance and is significantly faster. We also performed a case study to show that detecting concept drift is critical to minimize the reduction in model's performance.","PeriodicalId":426846,"journal":{"name":"2020 International Conference on Data Mining Workshops (ICDMW)","volume":"25 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":"{\"title\":\"MIR_MAD: An Efficient and On-line Approach for Anomaly Detection in Dynamic Data Stream\",\"authors\":\"Chang How Tan, V. C. Lee, Mahsa Salehi\",\"doi\":\"10.1109/ICDMW51313.2020.00065\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Anomaly detection in a dynamic data stream is a challenging task. The endless bound and high arriving rate of data prohibits anomaly detection models to store all observations in memory for processing. In addition, the dynamically moving properties of the data stream exhibit concept drift. While recent studies focus on feature extraction for anomaly detection, majority of them assume data stream are static ignoring the possibility of concept drift occurring. Anomaly detection models must operate efficiently in order to deal with high volume and velocity data, that is to have low complexity and to learn incrementally from each arriving observation. Incremental learning allows the model to adapt to concept drift. In cases where drifting rate is higher than adaptation rate, the capability to detect concept drift and retraining a new model is much preferable to minimize the performance losses. In this paper, we propose MIR_MAD, an approach based on multiple incremental robust Mahalanobis estimators that is efficient, learns incrementally and has the capability to detect concept drift. MIR_MAD is fast, can be initialized with small amount of data, and is able to estimate the drift location on the data stream. Our empirical results show that MIR_MAD achieves state-of-the-art performance and is significantly faster. We also performed a case study to show that detecting concept drift is critical to minimize the reduction in model's performance.\",\"PeriodicalId\":426846,\"journal\":{\"name\":\"2020 International Conference on Data Mining Workshops (ICDMW)\",\"volume\":\"25 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2020-11-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"3\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2020 International Conference on Data Mining Workshops (ICDMW)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICDMW51313.2020.00065\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2020 International Conference on Data Mining Workshops (ICDMW)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICDMW51313.2020.00065","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
MIR_MAD: An Efficient and On-line Approach for Anomaly Detection in Dynamic Data Stream
Anomaly detection in a dynamic data stream is a challenging task. The endless bound and high arriving rate of data prohibits anomaly detection models to store all observations in memory for processing. In addition, the dynamically moving properties of the data stream exhibit concept drift. While recent studies focus on feature extraction for anomaly detection, majority of them assume data stream are static ignoring the possibility of concept drift occurring. Anomaly detection models must operate efficiently in order to deal with high volume and velocity data, that is to have low complexity and to learn incrementally from each arriving observation. Incremental learning allows the model to adapt to concept drift. In cases where drifting rate is higher than adaptation rate, the capability to detect concept drift and retraining a new model is much preferable to minimize the performance losses. In this paper, we propose MIR_MAD, an approach based on multiple incremental robust Mahalanobis estimators that is efficient, learns incrementally and has the capability to detect concept drift. MIR_MAD is fast, can be initialized with small amount of data, and is able to estimate the drift location on the data stream. Our empirical results show that MIR_MAD achieves state-of-the-art performance and is significantly faster. We also performed a case study to show that detecting concept drift is critical to minimize the reduction in model's performance.