DRLLog: Deep Reinforcement Learning for Online Log Anomaly Detection

IF 4.7 2区 计算机科学 Q1 COMPUTER SCIENCE, INFORMATION SYSTEMS
Junwei Zhou;Yuyang Gao;Ying Zhu;Xiangtian Yu;Yanchao Yang;Cheng Tan;Jianwen Xiang
{"title":"DRLLog: Deep Reinforcement Learning for Online Log Anomaly Detection","authors":"Junwei Zhou;Yuyang Gao;Ying Zhu;Xiangtian Yu;Yanchao Yang;Cheng Tan;Jianwen Xiang","doi":"10.1109/TNSM.2025.3542595","DOIUrl":null,"url":null,"abstract":"System logs record the system’s status and application behavior, providing support for various system management and diagnostic tasks. However, existing methods for log anomaly detection face several challenges, including limitations in recognizing current types of anomalous logs and difficulties in performing online incremental updates to the anomaly detection models. To address these challenges, this paper introduces DRLLog, which applies Deep Reinforcement Learning (DRL) networks to detect anomalous events. DRLLog uses Deep Q Network (DQN) as the agent, with log entries serving as reward signals. By interacting with the environment generated from log data and adopting various action behaviors, it aims to maximize the reward value obtained as feedback. Through this approach, DRLLog achieves learning from historical log data and perception of the current environment, enabling continuous learning and adaptation to different log sequence patterns. Additionally, DRLLog introduces low-rank adaptation by using two low-rank parameter matrices in the fully connected layer of the DQN to represent changes in its weight matrix. During online model learning, only low-rank parameter matrices of the model are updated, effectively reducing the model’s overhead. Furthermore, DRLLog introduces focal loss to focus more on learning the features of anomalous logs, effectively addressing the issue of imbalanced quantities between normal and anomalous logs. We evaluated the performance on widely used log datasets, including HDFS, BGL and ThunderBird, showing an average improvement of 3% in F1-Score compared to baseline methods. During online model learning, DRLLog achieves an average reduction of 90% in parameter count and a significant decrease in training and testing time as well.","PeriodicalId":13423,"journal":{"name":"IEEE Transactions on Network and Service Management","volume":"22 3","pages":"2382-2395"},"PeriodicalIF":4.7000,"publicationDate":"2025-02-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Network and Service Management","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/10891191/","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}
引用次数: 0

Abstract

System logs record the system’s status and application behavior, providing support for various system management and diagnostic tasks. However, existing methods for log anomaly detection face several challenges, including limitations in recognizing current types of anomalous logs and difficulties in performing online incremental updates to the anomaly detection models. To address these challenges, this paper introduces DRLLog, which applies Deep Reinforcement Learning (DRL) networks to detect anomalous events. DRLLog uses Deep Q Network (DQN) as the agent, with log entries serving as reward signals. By interacting with the environment generated from log data and adopting various action behaviors, it aims to maximize the reward value obtained as feedback. Through this approach, DRLLog achieves learning from historical log data and perception of the current environment, enabling continuous learning and adaptation to different log sequence patterns. Additionally, DRLLog introduces low-rank adaptation by using two low-rank parameter matrices in the fully connected layer of the DQN to represent changes in its weight matrix. During online model learning, only low-rank parameter matrices of the model are updated, effectively reducing the model’s overhead. Furthermore, DRLLog introduces focal loss to focus more on learning the features of anomalous logs, effectively addressing the issue of imbalanced quantities between normal and anomalous logs. We evaluated the performance on widely used log datasets, including HDFS, BGL and ThunderBird, showing an average improvement of 3% in F1-Score compared to baseline methods. During online model learning, DRLLog achieves an average reduction of 90% in parameter count and a significant decrease in training and testing time as well.
DRLLog:深度强化学习在线日志异常检测
系统日志记录了系统的状态和应用行为,为各种系统管理和诊断任务提供支持。然而,现有的日志异常检测方法面临着一些挑战,包括识别当前异常日志类型的局限性以及对异常检测模型进行在线增量更新的困难。为了解决这些挑战,本文引入了DRLLog,它应用深度强化学习(DRL)网络来检测异常事件。DRLLog使用深度Q网络(Deep Q Network, DQN)作为代理,日志条目作为奖励信号。通过与日志数据生成的环境进行交互,采取各种行动行为,以最大限度地获得反馈的奖励价值。通过这种方法,DRLLog实现了对历史日志数据的学习和对当前环境的感知,能够持续学习和适应不同的日志序列模式。此外,DRLLog通过在DQN的全连接层中使用两个低秩参数矩阵来表示其权矩阵的变化,从而引入了低秩自适应。在模型在线学习过程中,只更新模型的低秩参数矩阵,有效地降低了模型的开销。此外,DRLLog引入了焦点丢失(focal loss)功能,更加专注于异常日志的特征学习,有效解决了正常和异常日志数量不平衡的问题。我们在广泛使用的日志数据集(包括HDFS、BGL和ThunderBird)上评估了性能,与基线方法相比,F1-Score平均提高了3%。在在线模型学习过程中,DRLLog的参数数量平均减少了90%,训练和测试时间也显著减少。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
IEEE Transactions on Network and Service Management
IEEE Transactions on Network and Service Management Computer Science-Computer Networks and Communications
CiteScore
9.30
自引率
15.10%
发文量
325
期刊介绍: IEEE Transactions on Network and Service Management will publish (online only) peerreviewed archival quality papers that advance the state-of-the-art and practical applications of network and service management. Theoretical research contributions (presenting new concepts and techniques) and applied contributions (reporting on experiences and experiments with actual systems) will be encouraged. These transactions will focus on the key technical issues related to: Management Models, Architectures and Frameworks; Service Provisioning, Reliability and Quality Assurance; Management Functions; Enabling Technologies; Information and Communication Models; Policies; Applications and Case Studies; Emerging Technologies and Standards.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信