Empirical Software Engineering最新文献

筛选
英文 中文
SecMLOps: A comprehensive framework for integrating security throughout the machine learning operations lifecycle. SecMLOps:用于在整个机器学习操作生命周期中集成安全性的综合框架。
IF 3.6 2区 计算机科学
Empirical Software Engineering Pub Date : 2026-01-01 Epub Date: 2026-02-11 DOI: 10.1007/s10664-025-10795-y
Xinrui Zhang, Pincan Zhao, Jason Jaskolka, Heng Li, Rongxing Lu
{"title":"SecMLOps: A comprehensive framework for integrating security throughout the machine learning operations lifecycle.","authors":"Xinrui Zhang, Pincan Zhao, Jason Jaskolka, Heng Li, Rongxing Lu","doi":"10.1007/s10664-025-10795-y","DOIUrl":"10.1007/s10664-025-10795-y","url":null,"abstract":"<p><p>Machine Learning (ML) has emerged as a pivotal technology in the operation of large and complex systems, driving advancements in fields such as autonomous vehicles, healthcare diagnostics, and financial fraud detection. Despite its benefits, the deployment of ML models brings significant security challenges, such as adversarial attacks, which can compromise the integrity and reliability of these systems. To address these challenges, this paper builds upon the concept of Secure Machine Learning Operations (SecMLOps), providing a comprehensive framework designed to integrate robust security measures throughout the entire ML operations (MLOps) lifecycle. SecMLOps builds on the principles of MLOps by embedding security considerations from the initial design phase through to deployment and continuous monitoring. This framework is particularly focused on safeguarding against sophisticated attacks that target various stages of the MLOps lifecycle, thereby enhancing the resilience and trustworthiness of ML applications. A detailed advanced pedestrian detection system (PDS) use case demonstrates the practical application of SecMLOps in securing critical MLOps. Through extensive empirical evaluations, we highlight the trade-offs between security measures and system performance, providing critical insights into optimizing security without unduly impacting operational efficiency. Our findings underscore the importance of a balanced approach, offering valuable guidance for practitioners on how to achieve an optimal balance between security and performance in ML deployments across various domains.</p>","PeriodicalId":11525,"journal":{"name":"Empirical Software Engineering","volume":"31 3","pages":"74"},"PeriodicalIF":3.6,"publicationDate":"2026-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12894179/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146200483","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Tools and benchmarks evolve: what is their impact on parameter tuning in SBSE experiments? 工具和基准的发展:它们对SBSE实验中的参数调优有什么影响?
IF 3.6 2区 计算机科学
Empirical Software Engineering Pub Date : 2026-01-01 Epub Date: 2025-11-04 DOI: 10.1007/s10664-025-10733-y
Amid Golmohammadi, Man Zhang, Andrea Arcuri
{"title":"Tools and benchmarks evolve: what is their impact on parameter tuning in SBSE experiments?","authors":"Amid Golmohammadi, Man Zhang, Andrea Arcuri","doi":"10.1007/s10664-025-10733-y","DOIUrl":"10.1007/s10664-025-10733-y","url":null,"abstract":"<p><p>In this article, we explore the impact of tool development and its evolution in Search-Based Software Engineering (SBSE) research. As a research tool evolves throughout the years, experiments with novel techniques might require reevaluation of previous studies, especially regarding parameter tuning. These reevaluations also give the opportunity to address the threats to external validity of these previous studies by employing a larger selection of artifacts. To conduct the replicated experiments in this study, the search-based fuzzer EvoMaster is chosen. This SBSE tool has been developed and extended throughout several years (since 2016) and tens of scientific studies. Among the chosen tool's parameters, 6 were carefully selected based on 5 previous studies that we replicate in this article with the latest version of EvoMaster. The replication is applied across an expanded set of artifacts compared to the original replicated studies. Our objective is to validate the robustness and validity of previous findings and to determine the need for parameter tuning in response to the tool's continuous development. Beyond replication, we explored parameter tuning by testing 729 different configurations to find a more performant parameter set, which is later validated through additional rounds of experiments. Additionally, we analyzed the impact of individual parameters on test generation performance using machine learning models, providing insights into their relative effects. Our findings indicate that, although most parameters maintain their efficacy, 2 of them require adjustment. Furthermore, the investigation into the effects of combining different parameter values reveals that carefully optimized configurations can outperform default settings. These findings highlight the importance of regularly reevaluating parameter settings to enhance tool performance in SBSE research.</p>","PeriodicalId":11525,"journal":{"name":"Empirical Software Engineering","volume":"31 1","pages":"8"},"PeriodicalIF":3.6,"publicationDate":"2026-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12583389/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145451308","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Less is more: usefulness of data flow diagrams and large language models for security threat validation. 少即是多:数据流图和用于安全威胁验证的大型语言模型的有用性。
IF 3.6 2区 计算机科学
Empirical Software Engineering Pub Date : 2026-01-01 Epub Date: 2026-04-21 DOI: 10.1007/s10664-026-10837-z
Winnie Bahati Mbaka, Katja Tuma
{"title":"Less is more: usefulness of data flow diagrams and large language models for security threat validation.","authors":"Winnie Bahati Mbaka, Katja Tuma","doi":"10.1007/s10664-026-10837-z","DOIUrl":"https://doi.org/10.1007/s10664-026-10837-z","url":null,"abstract":"<p><p>The arrival of recent cybersecurity standards has raised the bar for security assessments in organizations, but existing techniques require a high manual effort. Threat analysis and risk assessment are used to identify security threats for new or refactored systems. Still, there is a lack of definition-of-done, so identified threats have to be validated which slows down the analysis. Existing literature has focused on the overall effectiveness of threat analysis, but no previous work has investigated what material must the analysts use to effectively validate the identified security threats. We conduct a controlled experiment with practitioners to investigate whether having some analysis material (either the system's graphical model or LLM-generated advice) is better than none, and whether having both the system's graphical model and LLM-generated advice is better than having only one of them. We run a pilot of the experiment with 41 MSc students, a think-aloud study with three practitioners, and the experiment survey with 68 recruited practitioners. Our main findings suggest that, in terms of additional material needed for threat validation, less is more. We also find that participants perceived the graphical model as equally useful compared to LLMs and that, despite LLMs not always providing conclusive advice, practitioners still perceived it as somewhat useful. The experimental material and data analysis scripts is publicly available in a replication package.</p>","PeriodicalId":11525,"journal":{"name":"Empirical Software Engineering","volume":"31 5","pages":"122"},"PeriodicalIF":3.6,"publicationDate":"2026-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC13099727/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147766143","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Simplifying software compliance: AI technologies in drafting technical documentation for the AI Act. 简化软件遵从性:AI法案技术文档起草中的AI技术。
IF 3.5 2区 计算机科学
Empirical Software Engineering Pub Date : 2025-01-01 Epub Date: 2025-04-02 DOI: 10.1007/s10664-025-10645-x
Francesco Sovrano, Emmie Hine, Stefano Anzolut, Alberto Bacchelli
{"title":"Simplifying software compliance: AI technologies in drafting technical documentation for the AI Act.","authors":"Francesco Sovrano, Emmie Hine, Stefano Anzolut, Alberto Bacchelli","doi":"10.1007/s10664-025-10645-x","DOIUrl":"10.1007/s10664-025-10645-x","url":null,"abstract":"<p><p>The European AI Act has introduced specific technical documentation requirements for AI systems. Compliance with them is challenging due to the need for advanced knowledge of both legal and technical aspects, which is rare among software developers and legal professionals. Consequently, small and medium-sized enterprises may face high costs in meeting these requirements. In this study, we explore how contemporary AI technologies, including ChatGPT and an existing compliance tool (DoXpert), can aid software developers in creating technical documentation that complies with the AI Act. We specifically demonstrate how these AI tools can identify gaps in existing documentation according to the provisions of the AI Act. Using open-source high-risk AI systems as case studies, we collaborated with legal experts to evaluate how closely tool-generated assessments align with expert opinions. Findings show partial alignment, important issues with ChatGPT (3.5 and 4), and a moderate (and statistically significant) correlation between DoXpert and expert judgments, according to the Rank Biserial Correlation analysis. Nonetheless, these findings underscore the potential of AI to combine with human analysis and alleviate the compliance burden, supporting the broader goal of fostering responsible and transparent AI development under emerging regulatory frameworks.</p>","PeriodicalId":11525,"journal":{"name":"Empirical Software Engineering","volume":"30 3","pages":"91"},"PeriodicalIF":3.5,"publicationDate":"2025-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11965209/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143794942","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
On the effects of program slicing for vulnerability detection during code inspection. 论程序切片在代码检查过程中漏洞检测中的作用。
IF 3.5 2区 计算机科学
Empirical Software Engineering Pub Date : 2025-01-01 Epub Date: 2025-04-05 DOI: 10.1007/s10664-025-10636-y
Aurora Papotti, Katja Tuma, Fabio Massacci
{"title":"On the effects of program slicing for vulnerability detection during code inspection.","authors":"Aurora Papotti, Katja Tuma, Fabio Massacci","doi":"10.1007/s10664-025-10636-y","DOIUrl":"10.1007/s10664-025-10636-y","url":null,"abstract":"<p><p>Slicing is a fault localization technique that has been proposed to support debugging and program comprehension. Yet, its empirical effectiveness during code inspection by humans has received limited attention. The goal of our study is two-fold. First, we aim to define what it means for a code reviewer to identify the vulnerable lines correctly. Second, we investigate whether reducing the number of to-be-inspected lines by method-level slicing supports code reviewers in detecting security vulnerabilities. We propose a novel approach based on the notion of a <math><mi>δ</mi></math> -neighborhood (intuitively based on the idea of the context size of the command git  diff) to define correctly identified lines. Then, we conducted a multi-year controlled experiment (2017-2023) in which MSc students attending security courses ( <math><mrow><mi>n</mi> <mo>=</mo> <mn>236</mn></mrow> </math> ) were tasked with identifying vulnerable lines in original or sliced Java files from Apache Tomcat. We provide perfect seed lines for a slicing algorithm to control for confounding factors. Each treatment differs in the pair (Vulnerability, Original/Sliced) with a balanced design with vulnerabilities from the OWASP Top 10 2017: A1 (Injection), A5 (Broken Access Control), A6 (Security Misconfiguration), and A7 (Cross-Site Scripting). To generate smaller slices for human consumption, we used a variant of intra-procedural thin slicing. We report the results for <math><mrow><mi>δ</mi> <mo>=</mo> <mn>0</mn></mrow> </math> which corresponds to exactly matching the vulnerable ground truth lines, and <math><mrow><mi>δ</mi> <mo>=</mo> <mn>3</mn></mrow> </math> which represents the scenario of identifying the vulnerable area. For both cases, we found that slicing helps in 'finding something' (the participant has found at least some vulnerable lines) as opposed to 'finding nothing'. For the case of <math><mrow><mi>δ</mi> <mo>=</mo> <mn>0</mn></mrow> </math> analyzing a slice and analyzing the original file are statistically equivalent from the perspective of lines found by those who found something. With <math><mrow><mi>δ</mi> <mo>=</mo> <mn>3</mn></mrow> </math> slicing helps to find more vulnerabilities compared to analyzing an original file, as we would normally expect. Given the type of population, additional experiments are necessary to be generalized to experienced developers.</p>","PeriodicalId":11525,"journal":{"name":"Empirical Software Engineering","volume":"30 3","pages":"93"},"PeriodicalIF":3.5,"publicationDate":"2025-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11972194/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143802692","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
An empirical study of fault localisation techniques for deep neural networks. 深度神经网络故障定位技术的实证研究。
IF 3.6 2区 计算机科学
Empirical Software Engineering Pub Date : 2025-01-01 Epub Date: 2025-06-10 DOI: 10.1007/s10664-025-10657-7
Nargiz Humbatova, Jinhan Kim, Gunel Jahangirova, Shin Yoo, Paolo Tonella
{"title":"An empirical study of fault localisation techniques for deep neural networks.","authors":"Nargiz Humbatova, Jinhan Kim, Gunel Jahangirova, Shin Yoo, Paolo Tonella","doi":"10.1007/s10664-025-10657-7","DOIUrl":"10.1007/s10664-025-10657-7","url":null,"abstract":"<p><p>With the increased popularity of Deep Neural Networks (DNNs), increases also the need for tools to assist developers in the DNN implementation, testing and debugging process. Several approaches have been proposed that automatically analyse and localise potential faults in DNNs under test. In this work, we evaluate and compare existing state-of-the-art fault localisation techniques, which operate based on both dynamic and static analysis of the DNN. The evaluation is performed on a benchmark consisting of both real faults obtained from bug reporting platforms and faulty models produced by a mutation tool. Our findings indicate that the usage of a single, specific ground truth (e.g. the human-defined one) for the evaluation of DNN fault localisation tools results in pretty low performance (maximum average recall of 0.33 and precision of 0.21). However, such figures increase when considering alternative, equivalent patches that exist for a given faulty DNN. The results indicate that DeepFD is the most effective tool, achieving an average recall of 0.55 and a precision of 0.37 on our benchmark.</p>","PeriodicalId":11525,"journal":{"name":"Empirical Software Engineering","volume":"30 5","pages":"124"},"PeriodicalIF":3.6,"publicationDate":"2025-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12152046/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144283001","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Reinforcement learning for online testing of autonomous driving systems: a replication and extension study. 用于自动驾驶系统在线测试的强化学习:一项复制和扩展研究。
IF 3.5 2区 计算机科学
Empirical Software Engineering Pub Date : 2025-01-01 Epub Date: 2024-11-05 DOI: 10.1007/s10664-024-10562-5
Luca Giamattei, Matteo Biagiola, Roberto Pietrantuono, Stefano Russo, Paolo Tonella
{"title":"Reinforcement learning for online testing of autonomous driving systems: a replication and extension study.","authors":"Luca Giamattei, Matteo Biagiola, Roberto Pietrantuono, Stefano Russo, Paolo Tonella","doi":"10.1007/s10664-024-10562-5","DOIUrl":"10.1007/s10664-024-10562-5","url":null,"abstract":"<p><p>In a recent study, Reinforcement Learning (RL) used in combination with many-objective search, has been shown to outperform alternative techniques (random search and many-objective search) for online testing of Deep Neural Network-enabled systems. The empirical evaluation of these techniques was conducted on a state-of-the-art Autonomous Driving System (ADS). This work is a replication and extension of that empirical study. Our replication shows that RL does not outperform pure random test generation in a comparison conducted under the same settings of the original study, but with no confounding factor coming from the way collisions are measured. Our extension aims at eliminating some of the possible reasons for the poor performance of RL observed in our replication: (1) the presence of reward components providing contrasting feedback to the RL agent; (2) the usage of an RL algorithm (Q-learning) which requires discretization of an intrinsically continuous state space. Results show that our new RL agent is able to converge to an effective policy that outperforms random search. Results also highlight other possible improvements, which open to further investigations on how to best leverage RL for online ADS testing.</p>","PeriodicalId":11525,"journal":{"name":"Empirical Software Engineering","volume":"30 1","pages":"19"},"PeriodicalIF":3.5,"publicationDate":"2025-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11538197/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142602130","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Understanding security tactics in microservice APIs using annotated software architecture decomposition models - a controlled experiment. 使用带注释的软件架构分解模型理解微服务api中的安全策略——一个受控实验。
IF 3.5 2区 计算机科学
Empirical Software Engineering Pub Date : 2025-01-01 Epub Date: 2025-02-14 DOI: 10.1007/s10664-024-10601-1
Patric Genfer, Souhaila Serbout, Georg Simhandl, Uwe Zdun, Cesare Pautasso
{"title":"Understanding security tactics in microservice APIs using annotated software architecture decomposition models - a controlled experiment.","authors":"Patric Genfer, Souhaila Serbout, Georg Simhandl, Uwe Zdun, Cesare Pautasso","doi":"10.1007/s10664-024-10601-1","DOIUrl":"10.1007/s10664-024-10601-1","url":null,"abstract":"<p><p>While microservice architectures have become a widespread option for designing distributed applications, designing secure microservice systems remains challenging. Although various security-related guidelines and practices exist, these systems' sheer size, complex communication structures, and polyglot tech stacks make it difficult to manually validate whether adequate security tactics are applied throughout their architecture. To address these challenges, we have devised a novel solution that involves the automatic generation of security-annotated software decomposition models and the utilization of security-based metrics to guide software architectures through the assessment of security tactics employed within microservice systems. To evaluate the effectiveness of our artifacts, we conducted a controlled experiment where we asked 60 students from two universities and ten experts from the industry to identify and assess the security features of two microservice reference systems. During the experiment, we tracked the correctness of their answers and the time they needed to solve the given tasks to measure how well they could understand the security tactics applied in the reference systems. Our results indicate that the supplemental material significantly improved the correctness of the participants' answers without requiring them to consult the documentation more. Most participants also stated in a self-assessment that their understanding of the security tactics used in the systems improved significantly because of the provided material, with the additional diagrams considered very helpful. In contrast, the perception of architectural metrics varied widely. We could also show that novice developers benefited most from the supplementary diagrams. In contrast, senior developers could rely on their experience to compensate for the lack of additional help. Contrary to our expectations, we found no significant correlation between the time spent solving the tasks and the overall correctness score achieved, meaning that participants who took more time to read the documentation did not automatically achieve better results. As far as we know, this empirical study is the first analysis that explores the influence of security annotations in component diagrams to guide software developers when assessing microservice system security.</p>","PeriodicalId":11525,"journal":{"name":"Empirical Software Engineering","volume":"30 3","pages":"66"},"PeriodicalIF":3.5,"publicationDate":"2025-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11828814/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143432508","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
AI support for data scientists: An empirical study on workflow and alternative code recommendations. 对数据科学家的人工智能支持:关于工作流和替代代码建议的实证研究。
IF 3.5 2区 计算机科学
Empirical Software Engineering Pub Date : 2025-01-01 Epub Date: 2025-07-04 DOI: 10.1007/s10664-025-10622-4
Dhivyabharathi Ramasamy, Cristina Sarasua, Abraham Bernstein
{"title":"AI support for data scientists: An empirical study on workflow and alternative code recommendations.","authors":"Dhivyabharathi Ramasamy, Cristina Sarasua, Abraham Bernstein","doi":"10.1007/s10664-025-10622-4","DOIUrl":"10.1007/s10664-025-10622-4","url":null,"abstract":"<p><p>Despite the popularity of AI assistants for coding activities, there is limited empirical work on whether these coding assistants can help users complete data science tasks. Moreover, in data science programming, exploring alternative paths has been widely advocated, as such paths may lead to diverse understandings and conclusions (Gelman and Loken 2013; Kale et al. 2019). Whether existing AI-based coding assistants can support data scientists in exploring the relevant alternative paths remains unexplored. To fill this gap, we conducted a mixed-methods study to understand how data scientists solved different data science tasks with the help of an AI-based coding assistant that provides explicit alternatives as recommendations throughout the data science workflow. Specifically, we quantitatively investigated whether the users accept the code recommendations, including alternative recommendations, by the AI assistant and whether the recommendations are helpful when completing descriptive and predictive data science tasks. Through the empirical study, we also investigated if including information about the data science step (e.g., data exploration) they seek recommendations for in a prompt leads to helpful recommendations. In our study, we found that including the data science step in a prompt had a statistically significant improvement in the acceptance of recommendations, whereas the presence of alternatives did not lead to any significant differences. Our study also shows a statistically significant difference in the acceptance and usefulness of recommendations between descriptive and predictive tasks. Participants generally had positive sentiments regarding AI assistance and our proposed interface. We share further insights on the interactions that emerged during the study and the challenges that our users encountered while solving their data science tasks.</p><p><strong>Supplementary information: </strong>The online version contains supplementary material available at 10.1007/s10664-025-10622-4.</p>","PeriodicalId":11525,"journal":{"name":"Empirical Software Engineering","volume":"30 5","pages":"133"},"PeriodicalIF":3.5,"publicationDate":"2025-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12227384/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144575073","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Analyzing and mitigating (with LLMs) the security misconfigurations of Helm charts from Artifact Hub. 分析和减轻(使用llm)来自Artifact Hub的Helm图表的安全错误配置。
IF 3.6 2区 计算机科学
Empirical Software Engineering Pub Date : 2025-01-01 Epub Date: 2025-07-04 DOI: 10.1007/s10664-025-10688-0
Francesco Minna, Fabio Massacci, Katja Tuma
{"title":"Analyzing and mitigating (with LLMs) the security misconfigurations of Helm charts from Artifact Hub.","authors":"Francesco Minna, Fabio Massacci, Katja Tuma","doi":"10.1007/s10664-025-10688-0","DOIUrl":"10.1007/s10664-025-10688-0","url":null,"abstract":"<p><p>Helm is a package manager that allows defining, installing, and upgrading applications with Kubernetes (K8s), a popular container orchestration platform. A Helm chart is a collection of files describing all dependencies, resources, and parameters required for deploying an application within a K8s cluster. This study aimed to mine and empirically evaluate the security of Helm charts, comparing the performance of existing tools in terms of misconfigurations reported by policies available by default, and measuring to what extent LLMs could be used for removing misconfigurations. For these reasons, we proposed a pipeline to mine Helm charts from Artifact Hub, a popular centralized repository, and analyze them using state-of-the-art open-source tools like Checkov and KICS. First, the pipeline runs several chart analyzers and identifies the common and unique misconfigurations reported by each tool. Secondly, it uses LLMs to suggest a mitigation for each misconfiguration. Finally, the LLM refactored chart previously generated is analyzed again by the same tools to see whether it satisfies the tool's policies. We also performed a manual analysis on a subset of charts to evaluate whether there are false positive misconfigurations from the tool's reporting and in the LLM refactoring. We found that (i) there is a significant difference between LLMs, (ii) providing a snippet of the YAML template as input might be insufficient compared to all resources, and (iii) even though LLMs can generate correct fixes, they may also delete other irrelevant configurations that break the application.</p>","PeriodicalId":11525,"journal":{"name":"Empirical Software Engineering","volume":"30 5","pages":"132"},"PeriodicalIF":3.6,"publicationDate":"2025-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12227474/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144575074","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信
小红书