OIL-AD: An anomaly detection framework for decision-making sequences

IF 7.5 1区计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Pattern Recognition Pub Date : 2025-04-19 DOI:10.1016/j.patcog.2025.111656

Chen Wang , Sarah Erfani , Tansu Alpcan , Christopher Leckie

{"title":"OIL-AD: An anomaly detection framework for decision-making sequences","authors":"Chen Wang , Sarah Erfani , Tansu Alpcan , Christopher Leckie","doi":"10.1016/j.patcog.2025.111656","DOIUrl":null,"url":null,"abstract":"<div><div>Anomaly detection in decision-making sequences is a challenging problem due to the complexity of normality representation learning and the sequential nature of the task. Most existing methods based on Reinforcement Learning (RL) are difficult to implement in the real world due to unrealistic assumptions, such as having access to environment dynamics, reward signals, and online interactions with the environment. To address these limitations, we propose an unsupervised method named Offline Imitation Learning based Anomaly Detection (OIL-AD), which detects anomalies in decision-making sequences using two extracted behaviour features: <em>action optimality</em> and <em>sequential association</em>. Our offline learning model is an adaptation of behavioural cloning with a transformer policy network, where we modify the training process to learn a Q function and a state value function from normal trajectories. We propose that the Q function and the state value function can provide sufficient information about agents’ behavioural data, from which we derive two features for anomaly detection. The intuition behind our method is that the <em>action optimality</em> feature derived from the Q function can differentiate the optimal action from others at each local state, and the <em>sequential association</em> feature derived from the state value function has the potential to maintain the temporal correlations between decisions (state–action pairs). Our experiments show that OIL-AD can achieve outstanding online anomaly detection performance with up to 34.8% improvement in <span><math><msub><mrow><mi>F</mi></mrow><mrow><mn>1</mn></mrow></msub></math></span> score over comparable baselines. The source code is available on <span><span>https://github.com/chenwang4/OILAD</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":49713,"journal":{"name":"Pattern Recognition","volume":"166 ","pages":"Article 111656"},"PeriodicalIF":7.5000,"publicationDate":"2025-04-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Pattern Recognition","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0031320325003164","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}

引用次数: 0

Abstract

Anomaly detection in decision-making sequences is a challenging problem due to the complexity of normality representation learning and the sequential nature of the task. Most existing methods based on Reinforcement Learning (RL) are difficult to implement in the real world due to unrealistic assumptions, such as having access to environment dynamics, reward signals, and online interactions with the environment. To address these limitations, we propose an unsupervised method named Offline Imitation Learning based Anomaly Detection (OIL-AD), which detects anomalies in decision-making sequences using two extracted behaviour features: action optimality and sequential association. Our offline learning model is an adaptation of behavioural cloning with a transformer policy network, where we modify the training process to learn a Q function and a state value function from normal trajectories. We propose that the Q function and the state value function can provide sufficient information about agents’ behavioural data, from which we derive two features for anomaly detection. The intuition behind our method is that the action optimality feature derived from the Q function can differentiate the optimal action from others at each local state, and the sequential association feature derived from the state value function has the potential to maintain the temporal correlations between decisions (state–action pairs). Our experiments show that OIL-AD can achieve outstanding online anomaly detection performance with up to 34.8% improvement in

F_{1}

score over comparable baselines. The source code is available on https://github.com/chenwang4/OILAD.

Abstract Image

查看原文本刊更多论文

OIL-AD：决策序列异常检测框架

由于正态表示学习的复杂性和任务的顺序性，决策序列中的异常检测是一个具有挑战性的问题。大多数现有的基于强化学习（RL）的方法很难在现实世界中实现，因为不切实际的假设，例如访问环境动态，奖励信号以及与环境的在线交互。为了解决这些限制，我们提出了一种无监督的方法，称为基于离线模仿学习的异常检测（OIL-AD），该方法使用两个提取的行为特征：动作最优性和顺序关联来检测决策序列中的异常。我们的离线学习模型是对行为克隆与变压器策略网络的适应，其中我们修改训练过程以从正常轨迹学习Q函数和状态值函数。我们提出Q函数和状态值函数可以提供关于智能体行为数据的足够信息，并从中得出异常检测的两个特征。我们的方法背后的直觉是，来自Q函数的动作最优性特征可以区分每个局部状态下的最佳动作，而来自状态值函数的顺序关联特征有可能维持决策之间的时间相关性（状态-动作对）。我们的实验表明，OIL-AD可以实现出色的在线异常检测性能，F1分数比可比基线提高34.8%。源代码可在https://github.com/chenwang4/OILAD上获得。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Pattern Recognition 工程技术-工程：电子与电气

CiteScore

14.40

自引率

16.20%

发文量

683

审稿时长

5.6 months

期刊介绍： The field of Pattern Recognition is both mature and rapidly evolving, playing a crucial role in various related fields such as computer vision, image processing, text analysis, and neural networks. It closely intersects with machine learning and is being applied in emerging areas like biometrics, bioinformatics, multimedia data analysis, and data science. The journal Pattern Recognition, established half a century ago during the early days of computer science, has since grown significantly in scope and influence.