Empirical Software Engineering最新文献

Simplifying software compliance: AI technologies in drafting technical documentation for the AI Act. 简化软件遵从性：AI法案技术文档起草中的AI技术。

IF 3.5 2区计算机科学

Empirical Software Engineering Pub Date : 2025-01-01 Epub Date: 2025-04-02 DOI: 10.1007/s10664-025-10645-x

Francesco Sovrano, Emmie Hine, Stefano Anzolut, Alberto Bacchelli

{"title":"Simplifying software compliance: AI technologies in drafting technical documentation for the AI Act.","authors":"Francesco Sovrano, Emmie Hine, Stefano Anzolut, Alberto Bacchelli","doi":"10.1007/s10664-025-10645-x","DOIUrl":"10.1007/s10664-025-10645-x","url":null,"abstract":"The European AI Act has introduced specific technical documentation requirements for AI systems. Compliance with them is challenging due to the need for advanced knowledge of both legal and technical aspects, which is rare among software developers and legal professionals. Consequently, small and medium-sized enterprises may face high costs in meeting these requirements. In this study, we explore how contemporary AI technologies, including ChatGPT and an existing compliance tool (DoXpert), can aid software developers in creating technical documentation that complies with the AI Act. We specifically demonstrate how these AI tools can identify gaps in existing documentation according to the provisions of the AI Act. Using open-source high-risk AI systems as case studies, we collaborated with legal experts to evaluate how closely tool-generated assessments align with expert opinions. Findings show partial alignment, important issues with ChatGPT (3.5 and 4), and a moderate (and statistically significant) correlation between DoXpert and expert judgments, according to the Rank Biserial Correlation analysis. Nonetheless, these findings underscore the potential of AI to combine with human analysis and alleviate the compliance burden, supporting the broader goal of fostering responsible and transparent AI development under emerging regulatory frameworks.","PeriodicalId":11525,"journal":{"name":"Empirical Software Engineering","volume":"30 3","pages":"91"},"PeriodicalIF":3.5,"publicationDate":"2025-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11965209/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143794942","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

On the effects of program slicing for vulnerability detection during code inspection. 论程序切片在代码检查过程中漏洞检测中的作用。

IF 3.5 2区计算机科学

Empirical Software Engineering Pub Date : 2025-01-01 Epub Date: 2025-04-05 DOI: 10.1007/s10664-025-10636-y

Aurora Papotti, Katja Tuma, Fabio Massacci

{"title":"On the effects of program slicing for vulnerability detection during code inspection.","authors":"Aurora Papotti, Katja Tuma, Fabio Massacci","doi":"10.1007/s10664-025-10636-y","DOIUrl":"10.1007/s10664-025-10636-y","url":null,"abstract":"Slicing is a fault localization technique that has been proposed to support debugging and program comprehension. Yet, its empirical effectiveness during code inspection by humans has received limited attention. The goal of our study is two-fold. First, we aim to define what it means for a code reviewer to identify the vulnerable lines correctly. Second, we investigate whether reducing the number of to-be-inspected lines by method-level slicing supports code reviewers in detecting security vulnerabilities. We propose a novel approach based on the notion of a <math><mi>δ</mi></math> -neighborhood (intuitively based on the idea of the context size of the command git diff) to define correctly identified lines. Then, we conducted a multi-year controlled experiment (2017-2023) in which MSc students attending security courses ( <math><mrow><mi>n</mi> <mo>=</mo> <mn>236</mn></mrow> </math> ) were tasked with identifying vulnerable lines in original or sliced Java files from Apache Tomcat. We provide perfect seed lines for a slicing algorithm to control for confounding factors. Each treatment differs in the pair (Vulnerability, Original/Sliced) with a balanced design with vulnerabilities from the OWASP Top 10 2017: A1 (Injection), A5 (Broken Access Control), A6 (Security Misconfiguration), and A7 (Cross-Site Scripting). To generate smaller slices for human consumption, we used a variant of intra-procedural thin slicing. We report the results for <math><mrow><mi>δ</mi> <mo>=</mo> <mn>0</mn></mrow> </math> which corresponds to exactly matching the vulnerable ground truth lines, and <math><mrow><mi>δ</mi> <mo>=</mo> <mn>3</mn></mrow> </math> which represents the scenario of identifying the vulnerable area. For both cases, we found that slicing helps in 'finding something' (the participant has found at least some vulnerable lines) as opposed to 'finding nothing'. For the case of <math><mrow><mi>δ</mi> <mo>=</mo> <mn>0</mn></mrow> </math> analyzing a slice and analyzing the original file are statistically equivalent from the perspective of lines found by those who found something. With <math><mrow><mi>δ</mi> <mo>=</mo> <mn>3</mn></mrow> </math> slicing helps to find more vulnerabilities compared to analyzing an original file, as we would normally expect. Given the type of population, additional experiments are necessary to be generalized to experienced developers.","PeriodicalId":11525,"journal":{"name":"Empirical Software Engineering","volume":"30 3","pages":"93"},"PeriodicalIF":3.5,"publicationDate":"2025-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11972194/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143802692","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Reinforcement learning for online testing of autonomous driving systems: a replication and extension study. 用于自动驾驶系统在线测试的强化学习：一项复制和扩展研究。

IF 3.5 2区计算机科学

Empirical Software Engineering Pub Date : 2025-01-01 Epub Date: 2024-11-05 DOI: 10.1007/s10664-024-10562-5

Luca Giamattei, Matteo Biagiola, Roberto Pietrantuono, Stefano Russo, Paolo Tonella

{"title":"Reinforcement learning for online testing of autonomous driving systems: a replication and extension study.","authors":"Luca Giamattei, Matteo Biagiola, Roberto Pietrantuono, Stefano Russo, Paolo Tonella","doi":"10.1007/s10664-024-10562-5","DOIUrl":"10.1007/s10664-024-10562-5","url":null,"abstract":"In a recent study, Reinforcement Learning (RL) used in combination with many-objective search, has been shown to outperform alternative techniques (random search and many-objective search) for online testing of Deep Neural Network-enabled systems. The empirical evaluation of these techniques was conducted on a state-of-the-art Autonomous Driving System (ADS). This work is a replication and extension of that empirical study. Our replication shows that RL does not outperform pure random test generation in a comparison conducted under the same settings of the original study, but with no confounding factor coming from the way collisions are measured. Our extension aims at eliminating some of the possible reasons for the poor performance of RL observed in our replication: (1) the presence of reward components providing contrasting feedback to the RL agent; (2) the usage of an RL algorithm (Q-learning) which requires discretization of an intrinsically continuous state space. Results show that our new RL agent is able to converge to an effective policy that outperforms random search. Results also highlight other possible improvements, which open to further investigations on how to best leverage RL for online ADS testing.","PeriodicalId":11525,"journal":{"name":"Empirical Software Engineering","volume":"30 1","pages":"19"},"PeriodicalIF":3.5,"publicationDate":"2025-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11538197/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142602130","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

An empirical study of fault localisation techniques for deep neural networks. 深度神经网络故障定位技术的实证研究。

IF 3.5 2区计算机科学

Empirical Software Engineering Pub Date : 2025-01-01 Epub Date: 2025-06-10 DOI: 10.1007/s10664-025-10657-7

Nargiz Humbatova, Jinhan Kim, Gunel Jahangirova, Shin Yoo, Paolo Tonella

{"title":"An empirical study of fault localisation techniques for deep neural networks.","authors":"Nargiz Humbatova, Jinhan Kim, Gunel Jahangirova, Shin Yoo, Paolo Tonella","doi":"10.1007/s10664-025-10657-7","DOIUrl":"https://doi.org/10.1007/s10664-025-10657-7","url":null,"abstract":"With the increased popularity of Deep Neural Networks (DNNs), increases also the need for tools to assist developers in the DNN implementation, testing and debugging process. Several approaches have been proposed that automatically analyse and localise potential faults in DNNs under test. In this work, we evaluate and compare existing state-of-the-art fault localisation techniques, which operate based on both dynamic and static analysis of the DNN. The evaluation is performed on a benchmark consisting of both real faults obtained from bug reporting platforms and faulty models produced by a mutation tool. Our findings indicate that the usage of a single, specific ground truth (e.g. the human-defined one) for the evaluation of DNN fault localisation tools results in pretty low performance (maximum average recall of 0.33 and precision of 0.21). However, such figures increase when considering alternative, equivalent patches that exist for a given faulty DNN. The results indicate that DeepFD is the most effective tool, achieving an average recall of 0.55 and a precision of 0.37 on our benchmark.","PeriodicalId":11525,"journal":{"name":"Empirical Software Engineering","volume":"30 5","pages":"124"},"PeriodicalIF":3.5,"publicationDate":"2025-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12152046/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144283001","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Understanding security tactics in microservice APIs using annotated software architecture decomposition models - a controlled experiment. 使用带注释的软件架构分解模型理解微服务api中的安全策略——一个受控实验。

IF 3.5 2区计算机科学

Empirical Software Engineering Pub Date : 2025-01-01 Epub Date: 2025-02-14 DOI: 10.1007/s10664-024-10601-1

Patric Genfer, Souhaila Serbout, Georg Simhandl, Uwe Zdun, Cesare Pautasso

{"title":"Understanding security tactics in microservice APIs using annotated software architecture decomposition models - a controlled experiment.","authors":"Patric Genfer, Souhaila Serbout, Georg Simhandl, Uwe Zdun, Cesare Pautasso","doi":"10.1007/s10664-024-10601-1","DOIUrl":"10.1007/s10664-024-10601-1","url":null,"abstract":"While microservice architectures have become a widespread option for designing distributed applications, designing secure microservice systems remains challenging. Although various security-related guidelines and practices exist, these systems' sheer size, complex communication structures, and polyglot tech stacks make it difficult to manually validate whether adequate security tactics are applied throughout their architecture. To address these challenges, we have devised a novel solution that involves the automatic generation of security-annotated software decomposition models and the utilization of security-based metrics to guide software architectures through the assessment of security tactics employed within microservice systems. To evaluate the effectiveness of our artifacts, we conducted a controlled experiment where we asked 60 students from two universities and ten experts from the industry to identify and assess the security features of two microservice reference systems. During the experiment, we tracked the correctness of their answers and the time they needed to solve the given tasks to measure how well they could understand the security tactics applied in the reference systems. Our results indicate that the supplemental material significantly improved the correctness of the participants' answers without requiring them to consult the documentation more. Most participants also stated in a self-assessment that their understanding of the security tactics used in the systems improved significantly because of the provided material, with the additional diagrams considered very helpful. In contrast, the perception of architectural metrics varied widely. We could also show that novice developers benefited most from the supplementary diagrams. In contrast, senior developers could rely on their experience to compensate for the lack of additional help. Contrary to our expectations, we found no significant correlation between the time spent solving the tasks and the overall correctness score achieved, meaning that participants who took more time to read the documentation did not automatically achieve better results. As far as we know, this empirical study is the first analysis that explores the influence of security annotations in component diagrams to guide software developers when assessing microservice system security.","PeriodicalId":11525,"journal":{"name":"Empirical Software Engineering","volume":"30 3","pages":"66"},"PeriodicalIF":3.5,"publicationDate":"2025-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11828814/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143432508","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Analyzing and mitigating (with LLMs) the security misconfigurations of Helm charts from Artifact Hub. 分析和减轻（使用llm）来自Artifact Hub的Helm图表的安全错误配置。

IF 3.5 2区计算机科学

Empirical Software Engineering Pub Date : 2025-01-01 Epub Date: 2025-07-04 DOI: 10.1007/s10664-025-10688-0

Francesco Minna, Fabio Massacci, Katja Tuma

{"title":"Analyzing and mitigating (with LLMs) the security misconfigurations of Helm charts from Artifact Hub.","authors":"Francesco Minna, Fabio Massacci, Katja Tuma","doi":"10.1007/s10664-025-10688-0","DOIUrl":"https://doi.org/10.1007/s10664-025-10688-0","url":null,"abstract":"Helm is a package manager that allows defining, installing, and upgrading applications with Kubernetes (K8s), a popular container orchestration platform. A Helm chart is a collection of files describing all dependencies, resources, and parameters required for deploying an application within a K8s cluster. This study aimed to mine and empirically evaluate the security of Helm charts, comparing the performance of existing tools in terms of misconfigurations reported by policies available by default, and measuring to what extent LLMs could be used for removing misconfigurations. For these reasons, we proposed a pipeline to mine Helm charts from Artifact Hub, a popular centralized repository, and analyze them using state-of-the-art open-source tools like Checkov and KICS. First, the pipeline runs several chart analyzers and identifies the common and unique misconfigurations reported by each tool. Secondly, it uses LLMs to suggest a mitigation for each misconfiguration. Finally, the LLM refactored chart previously generated is analyzed again by the same tools to see whether it satisfies the tool's policies. We also performed a manual analysis on a subset of charts to evaluate whether there are false positive misconfigurations from the tool's reporting and in the LLM refactoring. We found that (i) there is a significant difference between LLMs, (ii) providing a snippet of the YAML template as input might be insufficient compared to all resources, and (iii) even though LLMs can generate correct fixes, they may also delete other irrelevant configurations that break the application.","PeriodicalId":11525,"journal":{"name":"Empirical Software Engineering","volume":"30 5","pages":"132"},"PeriodicalIF":3.5,"publicationDate":"2025-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12227474/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144575074","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

AI support for data scientists: An empirical study on workflow and alternative code recommendations. 对数据科学家的人工智能支持：关于工作流和替代代码建议的实证研究。

IF 3.5 2区计算机科学

Empirical Software Engineering Pub Date : 2025-01-01 Epub Date: 2025-07-04 DOI: 10.1007/s10664-025-10622-4

Dhivyabharathi Ramasamy, Cristina Sarasua, Abraham Bernstein

{"title":"AI support for data scientists: An empirical study on workflow and alternative code recommendations.","authors":"Dhivyabharathi Ramasamy, Cristina Sarasua, Abraham Bernstein","doi":"10.1007/s10664-025-10622-4","DOIUrl":"10.1007/s10664-025-10622-4","url":null,"abstract":"Despite the popularity of AI assistants for coding activities, there is limited empirical work on whether these coding assistants can help users complete data science tasks. Moreover, in data science programming, exploring alternative paths has been widely advocated, as such paths may lead to diverse understandings and conclusions (Gelman and Loken 2013; Kale et al. 2019). Whether existing AI-based coding assistants can support data scientists in exploring the relevant alternative paths remains unexplored. To fill this gap, we conducted a mixed-methods study to understand how data scientists solved different data science tasks with the help of an AI-based coding assistant that provides explicit alternatives as recommendations throughout the data science workflow. Specifically, we quantitatively investigated whether the users accept the code recommendations, including alternative recommendations, by the AI assistant and whether the recommendations are helpful when completing descriptive and predictive data science tasks. Through the empirical study, we also investigated if including information about the data science step (e.g., data exploration) they seek recommendations for in a prompt leads to helpful recommendations. In our study, we found that including the data science step in a prompt had a statistically significant improvement in the acceptance of recommendations, whereas the presence of alternatives did not lead to any significant differences. Our study also shows a statistically significant difference in the acceptance and usefulness of recommendations between descriptive and predictive tasks. Participants generally had positive sentiments regarding AI assistance and our proposed interface. We share further insights on the interactions that emerged during the study and the challenges that our users encountered while solving their data science tasks.Supplementary information: The online version contains supplementary material available at 10.1007/s10664-025-10622-4.","PeriodicalId":11525,"journal":{"name":"Empirical Software Engineering","volume":"30 5","pages":"133"},"PeriodicalIF":3.5,"publicationDate":"2025-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12227384/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144575073","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Assessing the adoption of security policies by developers in terraform across different cloud providers. 评估开发人员在跨不同云提供商的平台中采用的安全策略。

IF 3.5 2区计算机科学

Empirical Software Engineering Pub Date : 2025-01-01 Epub Date: 2025-02-27 DOI: 10.1007/s10664-024-10610-0

Alexandre Verdet, Mohammad Hamdaqa, Leuson Da Silva, Foutse Khomh

{"title":"Assessing the adoption of security policies by developers in terraform across different cloud providers.","authors":"Alexandre Verdet, Mohammad Hamdaqa, Leuson Da Silva, Foutse Khomh","doi":"10.1007/s10664-024-10610-0","DOIUrl":"https://doi.org/10.1007/s10664-024-10610-0","url":null,"abstract":"Cloud computing has become popular thanks to the widespread use of Infrastructure as Code (IaC) tools, allowing the community to manage and configure cloud infrastructure using scripts. However, the scripting process does not automatically prevent practitioners from introducing misconfigurations, vulnerabilities, or privacy risks. As a result, ensuring security relies on practitioners' understanding and the adoption of explicit policies. To understand how practitioners deal with this problem, we perform an empirical study analyzing the adoption of scripted security best practices present in Terraform files, applied on AWS, Azure, and Google Cloud. We assess the adoption of these practices by analyzing a sample of 812 open-source GitHub projects. We scan each project's configuration files, looking for policy implementation through static analysis (Checkov and Tfsec). The category Access policy emerges as the most widely adopted in all providers, while Encryption at rest presents the most neglected policies. Regarding the cloud providers, we observe that AWS and Azure present similar behavior regarding attended and neglected policies. Finally, we provide guidelines for cloud practitioners to limit infrastructure vulnerability and discuss further aspects associated with policies that have yet to be extensively embraced within the industry.","PeriodicalId":11525,"journal":{"name":"Empirical Software Engineering","volume":"30 3","pages":"74"},"PeriodicalIF":3.5,"publicationDate":"2025-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11868142/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143540588","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

The effect of data complexity on classifier performance. 数据复杂性对分类器性能的影响。

IF 3.5 2区计算机科学

Empirical Software Engineering Pub Date : 2025-01-01 Epub Date: 2024-10-31 DOI: 10.1007/s10664-024-10554-5

Jonas Eberlein, Daniel Rodriguez, Rachel Harrison

{"title":"The effect of data complexity on classifier performance.","authors":"Jonas Eberlein, Daniel Rodriguez, Rachel Harrison","doi":"10.1007/s10664-024-10554-5","DOIUrl":"10.1007/s10664-024-10554-5","url":null,"abstract":"The research area of Software Defect Prediction (SDP) is both extensive and popular, and is often treated as a classification problem. Improvements in classification, pre-processing and tuning techniques, (together with many factors which can influence model performance) have encouraged this trend. However, no matter the effort in these areas, it seems that there is a ceiling in the performance of the classification models used in SDP. In this paper, the issue of classifier performance is analysed from the perspective of data complexity. Specifically, data complexity metrics are calculated using the Unified Bug Dataset, a collection of well-known SDP datasets, and then checked for correlation with the defect prediction performance of machine learning classifiers (in particular, the classifiers C5.0, Naive Bayes, Artificial Neural Networks, Random Forests, and Support Vector Machines). In this work, different domains of competence and incompetence are identified for the classifiers. Similarities and differences between the classifiers and the performance metrics are found and the Unified Bug Dataset is analysed from the perspective of data complexity. We found that certain classifiers work best in certain situations and that all data complexity metrics can be problematic, although certain classifiers did excel in some situations.","PeriodicalId":11525,"journal":{"name":"Empirical Software Engineering","volume":"30 1","pages":"16"},"PeriodicalIF":3.5,"publicationDate":"2025-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11527945/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142570943","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

A comprehensive study of machine learning techniques for log-based anomaly detection. 基于日志异常检测的机器学习技术的综合研究。

IF 3.5 2区计算机科学

Empirical Software Engineering Pub Date : 2025-01-01 Epub Date: 2025-06-23 DOI: 10.1007/s10664-025-10669-3

Shan Ali, Chaima Boufaied, Domenico Bianculli, Paula Branco, Lionel Briand

{"title":"A comprehensive study of machine learning techniques for log-based anomaly detection.","authors":"Shan Ali, Chaima Boufaied, Domenico Bianculli, Paula Branco, Lionel Briand","doi":"10.1007/s10664-025-10669-3","DOIUrl":"10.1007/s10664-025-10669-3","url":null,"abstract":"Growth in system complexity increases the need for automated techniques dedicated to different log analysis tasks such as Log-based Anomaly Detection (LAD). The latter has been widely addressed in the literature, mostly by means of a variety of deep learning techniques. However, despite their many advantages, that focus on deep learning techniques is somewhat arbitrary as traditional Machine Learning (ML) techniques may perform well in many cases, depending on the context and datasets. In the same vein, semi-supervised techniques deserve the same attention as supervised techniques since the former have clear practical advantages. Further, current evaluations mostly rely on the assessment of detection accuracy. However, this is not enough to decide whether or not a specific ML technique is suitable to address the LAD problem in a given context. Other aspects to consider include training and prediction times as well as the sensitivity to hyperparameter tuning, which in practice matters to engineers. In this paper, we present a comprehensive empirical study, in which we evaluate a wide array of supervised and semi-supervised, traditional and deep ML techniques w.r.t. four evaluation criteria: detection accuracy, time performance, sensitivity of detection accuracy and time performance to hyperparameter tuning. Our goal is to provide much stronger and comprehensive evidence regarding the relative advantages and drawbacks of alternative techniques for LAD. The experimental results show that supervised traditional and deep ML techniques fare similarly in terms of their detection accuracy and prediction time on most of the benchmark datasets considered in our study. Moreover, overall, sensitivity analysis to hyperparameter tuning with respect to detection accuracy shows that supervised traditional ML techniques are less sensitive than deep learning techniques. Further, semi-supervised techniques yield significantly worse detection accuracy than supervised techniques.","PeriodicalId":11525,"journal":{"name":"Empirical Software Engineering","volume":"30 5","pages":"129"},"PeriodicalIF":3.5,"publicationDate":"2025-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12185583/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144495216","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0