Reinforcement Learning-based Anomaly Detection for PHM applications

2022 IEEE Aerospace Conference (AERO) Pub Date : 2022-03-05 DOI:10.1109/AERO53065.2022.9843543

Samir Khan, T. Yairi, Shinichi Nakasuka, S. Tsutsumi

{"title":"Reinforcement Learning-based Anomaly Detection for PHM applications","authors":"Samir Khan, T. Yairi, Shinichi Nakasuka, S. Tsutsumi","doi":"10.1109/AERO53065.2022.9843543","DOIUrl":null,"url":null,"abstract":"Prognostics and Health Management (PHM) is an essential requirement for engineering assets. Its processing strategies include modules for the detection, diagnostics and prognostics of known fault conditions. However, during operation, there are always fault conditions that were not anticipated. These events manifest as anomalies and could potentially be catastrophic with the loss of the asset. Anomalies can indicate an impending fault condition, therefore, the automatic identification of anomalies can lead to solving reliability problems that might manifest because of complexities arising from the operating environment and component degradation. Data-driven approaches have gained increasing popularity as a comprehensive anomaly detection method whenever data on nominal and fault conditions is available. However, many supervised learning techniques often face problems whenever models are trained from the limited set of partially labelled anomalies, whilst the rest of the dataset is left unlabelled. An alternative is to use unsupervised learning techniques, that are supposed to obviate stipulating the performance of the anomaly detector. But these still often produce many false positives because of the lack of prior knowledge of true anomalies. Considering this, this article investigates the use of a Reinforcement Learning (RL)-based approach to address the problem of unknown classes of anomalies that might lie beyond the scope of the initially trained model. A Q-learning method is used to exploit the existing data model whilst exploring new classes to improve classification accuracy and optimise decision making. This makes it of significant practical benefit, as anomalies can be unpredictable in form and usually evolve over time. In particular, a deep network-based anomaly detector agent is used to initially learn the action-value function (i.e., the Q-value function) from the limited labelled data. An environment is created for the agent to actively interact not only with the labelled anomalies but also to explore rare and novel unlabelled anomalies that might lie beyond the scope of the initially trained model. A reward function is defined based on the sparse normative content, which stipulates when the agent detects the anomaly state. However, the robustness of this method is still an open question as it simply shifts the anomaly detection responsibility onto the reward function being used. This shows the strong dependence on how the problem state-action space is defined for these methods to perform well.","PeriodicalId":219988,"journal":{"name":"2022 IEEE Aerospace Conference (AERO)","volume":"9 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-03-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 IEEE Aerospace Conference (AERO)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/AERO53065.2022.9843543","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

Prognostics and Health Management (PHM) is an essential requirement for engineering assets. Its processing strategies include modules for the detection, diagnostics and prognostics of known fault conditions. However, during operation, there are always fault conditions that were not anticipated. These events manifest as anomalies and could potentially be catastrophic with the loss of the asset. Anomalies can indicate an impending fault condition, therefore, the automatic identification of anomalies can lead to solving reliability problems that might manifest because of complexities arising from the operating environment and component degradation. Data-driven approaches have gained increasing popularity as a comprehensive anomaly detection method whenever data on nominal and fault conditions is available. However, many supervised learning techniques often face problems whenever models are trained from the limited set of partially labelled anomalies, whilst the rest of the dataset is left unlabelled. An alternative is to use unsupervised learning techniques, that are supposed to obviate stipulating the performance of the anomaly detector. But these still often produce many false positives because of the lack of prior knowledge of true anomalies. Considering this, this article investigates the use of a Reinforcement Learning (RL)-based approach to address the problem of unknown classes of anomalies that might lie beyond the scope of the initially trained model. A Q-learning method is used to exploit the existing data model whilst exploring new classes to improve classification accuracy and optimise decision making. This makes it of significant practical benefit, as anomalies can be unpredictable in form and usually evolve over time. In particular, a deep network-based anomaly detector agent is used to initially learn the action-value function (i.e., the Q-value function) from the limited labelled data. An environment is created for the agent to actively interact not only with the labelled anomalies but also to explore rare and novel unlabelled anomalies that might lie beyond the scope of the initially trained model. A reward function is defined based on the sparse normative content, which stipulates when the agent detects the anomaly state. However, the robustness of this method is still an open question as it simply shifts the anomaly detection responsibility onto the reward function being used. This shows the strong dependence on how the problem state-action space is defined for these methods to perform well.

查看原文本刊更多论文

基于强化学习的PHM应用异常检测

预测和健康管理(PHM)是工程资产的基本要求。其处理策略包括用于检测、诊断和预测已知故障条件的模块。然而，在运行过程中，总是会出现没有预料到的故障情况。这些事件表现为异常，可能会造成灾难性的资产损失。异常可以预示即将发生的故障，因此，对异常的自动识别可以帮助解决由于运行环境和部件退化引起的复杂性可能出现的可靠性问题。数据驱动方法作为一种综合的异常检测方法越来越受欢迎，只要有标称和故障条件的数据可用。然而，当模型从有限的部分标记异常集训练时，许多监督学习技术经常面临问题，而数据集的其余部分未标记。另一种选择是使用无监督学习技术，这种技术可以避免规定异常检测器的性能。但由于缺乏对真实异常的先验知识，这些方法仍然经常产生许多误报。考虑到这一点，本文研究了基于强化学习(RL)的方法的使用，以解决可能超出初始训练模型范围的未知异常类别的问题。使用Q-learning方法利用现有的数据模型，同时探索新的类，以提高分类精度和优化决策。这使得它具有显著的实际效益，因为异常的形式是不可预测的，并且通常会随着时间的推移而变化。特别地，使用基于深度网络的异常检测器代理从有限的标记数据中初始学习动作值函数(即q值函数)。为智能体创建了一个环境，不仅可以与标记的异常进行积极交互，还可以探索可能超出初始训练模型范围的罕见和新颖的未标记异常。基于稀疏的规范内容定义了奖励函数，它规定了智能体何时检测到异常状态。然而，这种方法的鲁棒性仍然是一个悬而未决的问题，因为它只是将异常检测的责任转移到正在使用的奖励函数上。这表明如何定义问题状态-动作空间对这些方法的性能有很大的依赖性。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2022 IEEE Aerospace Conference (AERO)

自引率

0.00%

发文量