{"title":"DRLLog:深度强化学习在线日志异常检测","authors":"Junwei Zhou;Yuyang Gao;Ying Zhu;Xiangtian Yu;Yanchao Yang;Cheng Tan;Jianwen Xiang","doi":"10.1109/TNSM.2025.3542595","DOIUrl":null,"url":null,"abstract":"System logs record the system’s status and application behavior, providing support for various system management and diagnostic tasks. However, existing methods for log anomaly detection face several challenges, including limitations in recognizing current types of anomalous logs and difficulties in performing online incremental updates to the anomaly detection models. To address these challenges, this paper introduces DRLLog, which applies Deep Reinforcement Learning (DRL) networks to detect anomalous events. DRLLog uses Deep Q Network (DQN) as the agent, with log entries serving as reward signals. By interacting with the environment generated from log data and adopting various action behaviors, it aims to maximize the reward value obtained as feedback. Through this approach, DRLLog achieves learning from historical log data and perception of the current environment, enabling continuous learning and adaptation to different log sequence patterns. Additionally, DRLLog introduces low-rank adaptation by using two low-rank parameter matrices in the fully connected layer of the DQN to represent changes in its weight matrix. During online model learning, only low-rank parameter matrices of the model are updated, effectively reducing the model’s overhead. Furthermore, DRLLog introduces focal loss to focus more on learning the features of anomalous logs, effectively addressing the issue of imbalanced quantities between normal and anomalous logs. We evaluated the performance on widely used log datasets, including HDFS, BGL and ThunderBird, showing an average improvement of 3% in F1-Score compared to baseline methods. During online model learning, DRLLog achieves an average reduction of 90% in parameter count and a significant decrease in training and testing time as well.","PeriodicalId":13423,"journal":{"name":"IEEE Transactions on Network and Service Management","volume":"22 3","pages":"2382-2395"},"PeriodicalIF":4.7000,"publicationDate":"2025-02-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"DRLLog: Deep Reinforcement Learning for Online Log Anomaly Detection\",\"authors\":\"Junwei Zhou;Yuyang Gao;Ying Zhu;Xiangtian Yu;Yanchao Yang;Cheng Tan;Jianwen Xiang\",\"doi\":\"10.1109/TNSM.2025.3542595\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"System logs record the system’s status and application behavior, providing support for various system management and diagnostic tasks. However, existing methods for log anomaly detection face several challenges, including limitations in recognizing current types of anomalous logs and difficulties in performing online incremental updates to the anomaly detection models. To address these challenges, this paper introduces DRLLog, which applies Deep Reinforcement Learning (DRL) networks to detect anomalous events. DRLLog uses Deep Q Network (DQN) as the agent, with log entries serving as reward signals. By interacting with the environment generated from log data and adopting various action behaviors, it aims to maximize the reward value obtained as feedback. Through this approach, DRLLog achieves learning from historical log data and perception of the current environment, enabling continuous learning and adaptation to different log sequence patterns. Additionally, DRLLog introduces low-rank adaptation by using two low-rank parameter matrices in the fully connected layer of the DQN to represent changes in its weight matrix. During online model learning, only low-rank parameter matrices of the model are updated, effectively reducing the model’s overhead. Furthermore, DRLLog introduces focal loss to focus more on learning the features of anomalous logs, effectively addressing the issue of imbalanced quantities between normal and anomalous logs. We evaluated the performance on widely used log datasets, including HDFS, BGL and ThunderBird, showing an average improvement of 3% in F1-Score compared to baseline methods. During online model learning, DRLLog achieves an average reduction of 90% in parameter count and a significant decrease in training and testing time as well.\",\"PeriodicalId\":13423,\"journal\":{\"name\":\"IEEE Transactions on Network and Service Management\",\"volume\":\"22 3\",\"pages\":\"2382-2395\"},\"PeriodicalIF\":4.7000,\"publicationDate\":\"2025-02-17\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"IEEE Transactions on Network and Service Management\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://ieeexplore.ieee.org/document/10891191/\",\"RegionNum\":2,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"COMPUTER SCIENCE, INFORMATION SYSTEMS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Network and Service Management","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/10891191/","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}
DRLLog: Deep Reinforcement Learning for Online Log Anomaly Detection
System logs record the system’s status and application behavior, providing support for various system management and diagnostic tasks. However, existing methods for log anomaly detection face several challenges, including limitations in recognizing current types of anomalous logs and difficulties in performing online incremental updates to the anomaly detection models. To address these challenges, this paper introduces DRLLog, which applies Deep Reinforcement Learning (DRL) networks to detect anomalous events. DRLLog uses Deep Q Network (DQN) as the agent, with log entries serving as reward signals. By interacting with the environment generated from log data and adopting various action behaviors, it aims to maximize the reward value obtained as feedback. Through this approach, DRLLog achieves learning from historical log data and perception of the current environment, enabling continuous learning and adaptation to different log sequence patterns. Additionally, DRLLog introduces low-rank adaptation by using two low-rank parameter matrices in the fully connected layer of the DQN to represent changes in its weight matrix. During online model learning, only low-rank parameter matrices of the model are updated, effectively reducing the model’s overhead. Furthermore, DRLLog introduces focal loss to focus more on learning the features of anomalous logs, effectively addressing the issue of imbalanced quantities between normal and anomalous logs. We evaluated the performance on widely used log datasets, including HDFS, BGL and ThunderBird, showing an average improvement of 3% in F1-Score compared to baseline methods. During online model learning, DRLLog achieves an average reduction of 90% in parameter count and a significant decrease in training and testing time as well.
期刊介绍:
IEEE Transactions on Network and Service Management will publish (online only) peerreviewed archival quality papers that advance the state-of-the-art and practical applications of network and service management. Theoretical research contributions (presenting new concepts and techniques) and applied contributions (reporting on experiences and experiments with actual systems) will be encouraged. These transactions will focus on the key technical issues related to: Management Models, Architectures and Frameworks; Service Provisioning, Reliability and Quality Assurance; Management Functions; Enabling Technologies; Information and Communication Models; Policies; Applications and Case Studies; Emerging Technologies and Standards.