On the Influence of Data Resampling for Deep Learning-Based Log Anomaly Detection: Insights and Recommendations

IF 6.5 1区计算机科学 Q1 COMPUTER SCIENCE, SOFTWARE ENGINEERING

IEEE Transactions on Software Engineering Pub Date : 2024-12-09 DOI:10.1109/TSE.2024.3513413

Xiaoxue Ma;Huiqi Zou;Pinjia He;Jacky Keung;Yishu Li;Xiao Yu;Federica Sarro

{"title":"On the Influence of Data Resampling for Deep Learning-Based Log Anomaly Detection: Insights and Recommendations","authors":"Xiaoxue Ma;Huiqi Zou;Pinjia He;Jacky Keung;Yishu Li;Xiao Yu;Federica Sarro","doi":"10.1109/TSE.2024.3513413","DOIUrl":null,"url":null,"abstract":"Numerous Deep Learning (DL)-based approaches have gained attention in software Log Anomaly Detection (LAD), yet class imbalance in training data remains a challenge, with anomalies often comprising less than 1% of datasets like Thunderbird. Existing DLLAD methods may underperform in severely imbalanced datasets. Although data resampling has proven effective in other software engineering tasks, it has not been explored in LAD. This study aims to fill this gap by providing an in-depth analysis of the impact of diverse data resampling methods on existing DLLAD approaches from two distinct perspectives. Firstly, we assess the performance of these DLLAD approaches across four datasets with different levels of class imbalance, and we explore the impact of resampling ratios of normal to abnormal data on DLLAD approaches. Secondly, we evaluate the effectiveness of the data resampling methods when utilizing optimal resampling ratios of normal to abnormal data. Our findings indicate that oversampling methods generally outperform undersampling and hybrid sampling methods. Data resampling on raw data yields superior results compared to data resampling in the feature space. These improvements are attributed to the increased attention given to important tokens. By exploring the resampling ratio of normal to abnormal data, we suggest generating more data for minority classes through oversampling while removing less data from majority classes through undersampling. In conclusion, our study provides valuable insights into the intricate relationship between data resampling methods and DLLAD. By addressing the challenge of class imbalance, researchers and practitioners can enhance DLLAD performance.","PeriodicalId":13324,"journal":{"name":"IEEE Transactions on Software Engineering","volume":"51 1","pages":"243-261"},"PeriodicalIF":6.5000,"publicationDate":"2024-12-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Software Engineering","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/10786877/","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, SOFTWARE ENGINEERING","Score":null,"Total":0}

引用次数: 0

Abstract

Numerous Deep Learning (DL)-based approaches have gained attention in software Log Anomaly Detection (LAD), yet class imbalance in training data remains a challenge, with anomalies often comprising less than 1% of datasets like Thunderbird. Existing DLLAD methods may underperform in severely imbalanced datasets. Although data resampling has proven effective in other software engineering tasks, it has not been explored in LAD. This study aims to fill this gap by providing an in-depth analysis of the impact of diverse data resampling methods on existing DLLAD approaches from two distinct perspectives. Firstly, we assess the performance of these DLLAD approaches across four datasets with different levels of class imbalance, and we explore the impact of resampling ratios of normal to abnormal data on DLLAD approaches. Secondly, we evaluate the effectiveness of the data resampling methods when utilizing optimal resampling ratios of normal to abnormal data. Our findings indicate that oversampling methods generally outperform undersampling and hybrid sampling methods. Data resampling on raw data yields superior results compared to data resampling in the feature space. These improvements are attributed to the increased attention given to important tokens. By exploring the resampling ratio of normal to abnormal data, we suggest generating more data for minority classes through oversampling while removing less data from majority classes through undersampling. In conclusion, our study provides valuable insights into the intricate relationship between data resampling methods and DLLAD. By addressing the challenge of class imbalance, researchers and practitioners can enhance DLLAD performance.

查看原文本刊更多论文

数据重采样对基于深度学习的日志异常检测的影响：见解和建议

许多基于深度学习（DL）的方法在软件日志异常检测（LAD）中得到了关注，但训练数据中的类不平衡仍然是一个挑战，异常通常占雷鸟等数据集的不到1%。现有的DLLAD方法在严重不平衡的数据集中可能表现不佳。虽然数据重采样已被证明在其他软件工程任务中是有效的，但它尚未在LAD中进行探索。本研究旨在通过从两个不同的角度深入分析不同数据重采样方法对现有DLLAD方法的影响来填补这一空白。首先，我们评估了这些DLLAD方法在四个不同类别失衡水平的数据集上的性能，并探讨了正常数据与异常数据的重采样比对DLLAD方法的影响。其次，利用最优的正态数据和异常数据的重采样比，评估了数据重采样方法的有效性。我们的研究结果表明，过采样方法通常优于欠采样和混合采样方法。在原始数据上的数据重采样比在特征空间上的数据重采样效果更好。这些改进归功于对重要代币的更多关注。通过探索正常数据与异常数据的重采样比，我们建议通过过采样为少数类生成更多的数据，而通过欠采样从多数类中去除更少的数据。总之，我们的研究为数据重采样方法与DLLAD之间的复杂关系提供了有价值的见解。通过解决阶级不平衡的挑战，研究人员和实践者可以提高DLLAD的性能。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

IEEE Transactions on Software Engineering 工程技术-工程：电子与电气

CiteScore

9.70

自引率

10.80%

发文量

724

审稿时长

6 months

期刊介绍： IEEE Transactions on Software Engineering seeks contributions comprising well-defined theoretical results and empirical studies with potential impacts on software construction, analysis, or management. The scope of this Transactions extends from fundamental mechanisms to the development of principles and their application in specific environments. Specific topic areas include: a) Development and maintenance methods and models: Techniques and principles for specifying, designing, and implementing software systems, encompassing notations and process models. b) Assessment methods: Software tests, validation, reliability models, test and diagnosis procedures, software redundancy, design for error control, and measurements and evaluation of process and product aspects. c) Software project management: Productivity factors, cost models, schedule and organizational issues, and standards. d) Tools and environments: Specific tools, integrated tool environments, associated architectures, databases, and parallel and distributed processing issues. e) System issues: Hardware-software trade-offs. f) State-of-the-art surveys: Syntheses and comprehensive reviews of the historical development within specific areas of interest.