{"title":"State-transition-aware anomaly detection under concept drifts","authors":"Bin Li, Shubham Gupta, Emmanuel Müller","doi":"10.1016/j.datak.2024.102365","DOIUrl":null,"url":null,"abstract":"<div><div>Detecting temporal abnormal patterns over streaming data is challenging due to volatile data properties and the lack of real-time labels. The abnormal patterns are usually hidden in the temporal context, which cannot be detected by evaluating single points. Furthermore, the normal state evolves over time due to concept drifts. A single model does not fit all data over time. Autoencoders have recently been applied for unsupervised anomaly detection. However, they are trained on a single normal state and usually become invalid after distributional drifts in the data stream. This paper uses an Autoencoder-based approach STAD for anomaly detection under concept drifts. In particular, we propose a state-transition-aware model to map different data distributions in each period of the data stream into states, thereby addressing the model adaptation problem in an interpretable way. In addition, we analyzed statistical tests to detect the drift by examining the sensitivity and powers. Furthermore, we present considerable ways to estimate the probability density function for comparing the distributional similarity for state transitions. Our experiments evaluate the proposed method on synthetic and real-world datasets. While delivering comparable anomaly detection performance as the state-of-the-art approaches, STAD works more efficiently and provides extra interpretability. We also provide insightful analysis of optimal hyperparameters for efficient model training and adaptation.</div></div>","PeriodicalId":55184,"journal":{"name":"Data & Knowledge Engineering","volume":"154 ","pages":"Article 102365"},"PeriodicalIF":2.7000,"publicationDate":"2024-09-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Data & Knowledge Engineering","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0169023X24000892","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 0
Abstract
Detecting temporal abnormal patterns over streaming data is challenging due to volatile data properties and the lack of real-time labels. The abnormal patterns are usually hidden in the temporal context, which cannot be detected by evaluating single points. Furthermore, the normal state evolves over time due to concept drifts. A single model does not fit all data over time. Autoencoders have recently been applied for unsupervised anomaly detection. However, they are trained on a single normal state and usually become invalid after distributional drifts in the data stream. This paper uses an Autoencoder-based approach STAD for anomaly detection under concept drifts. In particular, we propose a state-transition-aware model to map different data distributions in each period of the data stream into states, thereby addressing the model adaptation problem in an interpretable way. In addition, we analyzed statistical tests to detect the drift by examining the sensitivity and powers. Furthermore, we present considerable ways to estimate the probability density function for comparing the distributional similarity for state transitions. Our experiments evaluate the proposed method on synthetic and real-world datasets. While delivering comparable anomaly detection performance as the state-of-the-art approaches, STAD works more efficiently and provides extra interpretability. We also provide insightful analysis of optimal hyperparameters for efficient model training and adaptation.
由于数据的不稳定性和缺乏实时标签,在流数据中检测时间异常模式具有挑战性。异常模式通常隐藏在时间上下文中,无法通过评估单个点来检测。此外,由于概念漂移,正常状态会随时间发生变化。单一模型并不适合随时间变化的所有数据。自动编码器最近被应用于无监督异常检测。然而,它们是在单一正常状态下训练的,通常在数据流的分布漂移后就会失效。本文将基于自动编码器的 STAD 方法用于概念漂移下的异常检测。特别是,我们提出了一种状态转换感知模型,将数据流每个周期的不同数据分布映射为状态,从而以可解释的方式解决了模型适应问题。此外,我们还分析了统计检验,通过检验灵敏度和幂来检测漂移。此外,我们还提出了大量估算概率密度函数的方法,用于比较状态转换的分布相似性。我们的实验在合成数据集和真实数据集上对所提出的方法进行了评估。在提供与最先进方法相当的异常检测性能的同时,STAD 的工作效率更高,并提供了额外的可解释性。我们还对高效模型训练和适应的最佳超参数进行了深入分析。
期刊介绍:
Data & Knowledge Engineering (DKE) stimulates the exchange of ideas and interaction between these two related fields of interest. DKE reaches a world-wide audience of researchers, designers, managers and users. The major aim of the journal is to identify, investigate and analyze the underlying principles in the design and effective use of these systems.