Automated Software Engineering最新文献

筛选
英文 中文
Navigating bug cold start with contextual multi-armed bandits: an enhanced approach to developer assignment in software bug repositories 利用上下文多臂匪帮浏览错误冷启动:软件错误库中开发人员分配的增强方法
IF 2 2区 计算机科学
Automated Software Engineering Pub Date : 2025-04-16 DOI: 10.1007/s10515-025-00508-6
Neetu Singh, Sandeep Kumar Singh
{"title":"Navigating bug cold start with contextual multi-armed bandits: an enhanced approach to developer assignment in software bug repositories","authors":"Neetu Singh,&nbsp;Sandeep Kumar Singh","doi":"10.1007/s10515-025-00508-6","DOIUrl":"10.1007/s10515-025-00508-6","url":null,"abstract":"<div><p>Recommending the most suitable developer for new bugs poses a challenge to triagers in software bug repositories. Bugs vary in components, severity, priority, and other significant attributes, making it difficult to address them promptly. This difficulty is further compounded by the lack of background knowledge on new bugs, which impedes traditional recommender systems. In the absence of adequate information about either a developer or a bug, building, training, and testing a conventional machine-learning model becomes arduous. In such scenarios, one potential solution is employing a reinforcement-learning model. Often, triagers resort to simplistic approaches like selecting a random developer (explore strategy) or one who has been assigned frequently (exploit strategy). However, the research presented here demonstrates that these approaches based on multi-armed bandits (MAB) perform inadequately. To address this, we propose a novel improved bandit approach that utilizes contextual or side information to automatically recommend suitable developers for new or cold bugs. Experiments conducted on five publicly available open-source datasets have revealed that contextual MAB approaches outperformed simple MAB approaches. We have additionally evaluated the efficacy of two algorithms from Multi-Armed Bandit (MAB), as well as four algorithms from the Contextual-MAB algorithm. These algorithms were assessed based on four performance metrics, namely rewards, average rewards, regret, and average regret. The experimental results present a thorough framework for developer recommendation. The results indicate that all contextual-MAB approaches consistently outperform MAB approaches.</p></div>","PeriodicalId":55414,"journal":{"name":"Automated Software Engineering","volume":"32 2","pages":""},"PeriodicalIF":2.0,"publicationDate":"2025-04-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143835761","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
The impact of unsupervised feature selection techniques on the performance and interpretation of defect prediction models 无监督特征选择技术对缺陷预测模型的性能和解释的影响
IF 2 2区 计算机科学
Automated Software Engineering Pub Date : 2025-04-16 DOI: 10.1007/s10515-025-00510-y
Zhiqiang Li, Wenzhi Zhu, Hongyu Zhang, Yuantian Miao, Jie Ren
{"title":"The impact of unsupervised feature selection techniques on the performance and interpretation of defect prediction models","authors":"Zhiqiang Li,&nbsp;Wenzhi Zhu,&nbsp;Hongyu Zhang,&nbsp;Yuantian Miao,&nbsp;Jie Ren","doi":"10.1007/s10515-025-00510-y","DOIUrl":"10.1007/s10515-025-00510-y","url":null,"abstract":"<div><p>The performance and interpretation of a defect prediction model depend on the software metrics utilized in its construction. Feature selection techniques can enhance model performance and interpretation by effectively removing redundant, correlated, and irrelevant metrics from defect datasets. Previous empirical studies have scrutinized the impact of feature selection techniques on the performance and interpretation of defect prediction models. However, most feature selection techniques examined in these studies are primarily supervised. In particular, the impact of unsupervised feature selection (UFS) techniques on defect prediction remains unknown and needs to be explored extensively. To address this gap, we systematically apply 21 UFS techniques to evaluate their impact on the performance and interpretation of unsupervised defect prediction models in binary classification and effort-aware ranking scenarios. Extensive experiments are conducted on the 28 versions from 8 projects using 4 unsupervised models. We observe that: (1) 10–100% of the selected metrics are inconsistent between each pair of UFS techniques. (2) 29–100% of the selected metrics are inconsistent among different software modules. (3) For unsupervised defect prediction models, some UFS techniques (e.g., AutoSpearman, LS, and FMIUFS) exhibit the ability to effectively reduce the number of metrics while maintaining or even improving model performance. (4) UFS techniques alter the ranking of the top 3 groups of metrics in defect models, affecting the interpretation of these models. Based on these findings, we recommend that software practitioners utilize UFS techniques for unsupervised defect prediction. However, caution should be exercised when deriving insights and interpretations from defect prediction models.</p></div>","PeriodicalId":55414,"journal":{"name":"Automated Software Engineering","volume":"32 2","pages":""},"PeriodicalIF":2.0,"publicationDate":"2025-04-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143840329","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
SIFT: enhance the performance of vulnerability detection by incorporating structural knowledge and multi-task learning SIFT:结合结构知识和多任务学习,提高漏洞检测性能
IF 2 2区 计算机科学
Automated Software Engineering Pub Date : 2025-04-11 DOI: 10.1007/s10515-025-00507-7
Liping Wang, Guilong Lu, Xiang Chen, Xiaofeng Dai, Jianlin Qiu
{"title":"SIFT: enhance the performance of vulnerability detection by incorporating structural knowledge and multi-task learning","authors":"Liping Wang,&nbsp;Guilong Lu,&nbsp;Xiang Chen,&nbsp;Xiaofeng Dai,&nbsp;Jianlin Qiu","doi":"10.1007/s10515-025-00507-7","DOIUrl":"10.1007/s10515-025-00507-7","url":null,"abstract":"<div><p>Software vulnerabilities pose significant risks to software systems, leading to security breaches, data loss, operational disruptions, and substantial financial damage. Therefore, accurately detecting these vulnerabilities is of paramount importance. In recent years, pre-trained language models (PLMs) have demonstrated powerful capabilities in code representation and understanding, emerging as a promising method for vulnerability detection. However, integrating code structure knowledge while fine-tuning PLMs remains a significant challenge. To alleviate this limitation, we propose a novel vulnerability detection approach called SIFT. SIFT extracts the code property graph (CPG) to serve as the source of graph structural information. It constructs a code structure matrix from this information and measures the difference between the code structure matrix and the attention matrix using Sinkhorn Divergence to obtain the structural knowledge loss. This structural knowledge loss is then used alongside the cross-entropy loss for vulnerability detection in a multi-task learning framework to enhance overall detection performance. To evaluate the effectiveness of SIFT, we conducted experiments on three vulnerability detection datasets: FFmpeg+Qemu, Chrome+Debian, and Big-Vul. The results demonstrate that SIFT outperforms nine state-of-the-art vulnerability detection baselines, achieving performance improvements of 1.74%, 10.19%, and 2.87% in terms of F1 score, respectively. Our study shows the effectiveness of incorporating structural knowledge and multi-task learning in enhancing the performance of PLMs for vulnerability detection.</p></div>","PeriodicalId":55414,"journal":{"name":"Automated Software Engineering","volume":"32 2","pages":""},"PeriodicalIF":2.0,"publicationDate":"2025-04-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143818150","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Synthetic versus real: an analysis of critical scenarios for autonomous vehicle testing 合成与真实:自动驾驶汽车测试关键场景分析
IF 2 2区 计算机科学
Automated Software Engineering Pub Date : 2025-04-09 DOI: 10.1007/s10515-025-00499-4
Qunying Song, Avner Bensoussan, Mohammad Reza Mousavi
{"title":"Synthetic versus real: an analysis of critical scenarios for autonomous vehicle testing","authors":"Qunying Song,&nbsp;Avner Bensoussan,&nbsp;Mohammad Reza Mousavi","doi":"10.1007/s10515-025-00499-4","DOIUrl":"10.1007/s10515-025-00499-4","url":null,"abstract":"<div><p>With the emergence of autonomous vehicles comes the requirement of adequate and rigorous testing, particularly in critical scenarios that  are both challenging and potentially hazardous. Generating synthetic simulation-based critical scenarios for testing autonomous vehicles has therefore received considerable interest, yet it is unclear how such scenarios relate to the actual crash or near-crash scenarios  in the real world. Consequently, their realism is unknown. In this paper, we define realism as the degree of similarity of synthetic critical scenarios to real-world critical scenarios. We propose a methodology to measure realism using two metrics, namely attribute distribution and Euclidean distance. The methodology extracts various attributes from synthetic and realistic critical scenario datasets and performs a set of statistical tests to compare their distributions and distances. As a proof of concept for our methodology, we compare synthetic collision scenarios from DeepScenario against realistic autonomous vehicle collisions collected by the Department of Motor Vehicles in California, to analyse how well DeepScenario synthetic collision scenarios are aligned with real autonomous vehicle collisions recorded in California. We focus on five key attributes that are extractable from both datasets, and analyse the attribution distribution and distance between scenarios in the two datasets. Further, we derive recommendations to improve the realism of synthetic scenarios based on our analysis. Our study of realism provides a framework that can be replicated and extended for other dataset both concerning real-world and synthetically-generated scenarios.</p></div>","PeriodicalId":55414,"journal":{"name":"Automated Software Engineering","volume":"32 2","pages":""},"PeriodicalIF":2.0,"publicationDate":"2025-04-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://link.springer.com/content/pdf/10.1007/s10515-025-00499-4.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143809289","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Pipe-DBT: enhancing dynamic binary translation simulators to support pipeline-level simulation 管道- dbt:增强动态二进制转换模拟器以支持管道级仿真
IF 2 2区 计算机科学
Automated Software Engineering Pub Date : 2025-04-05 DOI: 10.1007/s10515-025-00506-8
Tiancheng Tang, Yi Man, Xinbing Zhou, Duqing Wang
{"title":"Pipe-DBT: enhancing dynamic binary translation simulators to support pipeline-level simulation","authors":"Tiancheng Tang,&nbsp;Yi Man,&nbsp;Xinbing Zhou,&nbsp;Duqing Wang","doi":"10.1007/s10515-025-00506-8","DOIUrl":"10.1007/s10515-025-00506-8","url":null,"abstract":"<div><p>In response to the lack of pipeline behavior modeling in Instruction-Set Simulators (ISS) and the performance limitations of Cycle-Accurate Simulators (CAS), this paper proposes Pipe-DBT, a pipeline simulation framework based on Dynamic Binary Translation (DBT). This method achieves a balance between accuracy and efficiency through two key techniques: (1) the design of a pipeline state descriptor called Pipsdep, which abstracts data hazards and resource contentions in the form of formal rules about resource occupancy and read/write behaviors, thereby avoiding low-level hardware details; (2) the introduction of a coroutine-based instruction execution flow partitioning mechanism that employs dynamic suspension/resumption to realize cycle-accurate scheduling in multi-stage pipelines. Implemented on QEMU, Pipe-DBT supports variable-length pipelines, a Very Long Instruction Word (VLIW) architecture with four-issue capability, and pipeline forwarding. Under typical DSP workloads, it achieves a simulation speed of 400–1100 KIPS, representing a 2.3<span>(times)</span> improvement over Gem5 in cycle-accurate mode. Experimental results show that only modular extensions to the host DBT framework are required to accommodate heterogeneous pipeline microarchitectures, thereby providing a high-throughput simulation infrastructure for processor design. To the best of our knowledge, this is the first pipeline-level simulation model implemented on a DBT simulator.</p></div>","PeriodicalId":55414,"journal":{"name":"Automated Software Engineering","volume":"32 2","pages":""},"PeriodicalIF":2.0,"publicationDate":"2025-04-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://link.springer.com/content/pdf/10.1007/s10515-025-00506-8.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143777994","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
An empirical study on the code naturalness modeling capability for LLMs in automated patch correctness assessment 自动补丁正确性评估中llm代码自然度建模能力的实证研究
IF 2 2区 计算机科学
Automated Software Engineering Pub Date : 2025-04-02 DOI: 10.1007/s10515-025-00502-y
Yuning Li, Wenkang Zhong, Zongwen Shen, Chuanyi Li, Xiang Chen, Jidong Ge, Bin Luo
{"title":"An empirical study on the code naturalness modeling capability for LLMs in automated patch correctness assessment","authors":"Yuning Li,&nbsp;Wenkang Zhong,&nbsp;Zongwen Shen,&nbsp;Chuanyi Li,&nbsp;Xiang Chen,&nbsp;Jidong Ge,&nbsp;Bin Luo","doi":"10.1007/s10515-025-00502-y","DOIUrl":"10.1007/s10515-025-00502-y","url":null,"abstract":"<div><p>Just like natural language, code can exhibit naturalness. This property manifests in highly repetitive patterns within specific contexts. Code naturalness can be captured by language models and then applied to various software engineering tasks (such as fault localization and program repair). Recently, Large Language Models (LLMs) based on Transformers have become advantageous tools for modeling code naturalness. However, existing work lacks systematic studies on the code naturalness modeling capability for LLMs. To bridge this gap, this paper explores the code naturalness modeling capability for LLMs, starting with the task of automated patch correctness assessment. Specifically, we investigate whether LLMs with different architectures and scales, under varying context window sizes, (1) can identify buggy code from common code based on naturalness and consider fixed code more natural than buggy code, and (2) can distinguish different degrees of repairs (i.e., complete repairs and incomplete repairs) from automated tools. Then, we propose metrics to assess the above two capabilities of the models. Experimental results indicate that models with different architectures and scales have the code naturalness modeling capability, even models not specifically pre-trained on code. Additionally, smaller models do not necessarily exhibit weaker modeling capability compared to larger models. We also find more contextual information only provides limited benefits. Based on experimental findings, we select the best performing model that has 220 M parameters to develop an Entropy-based Automated Patch Correctness Assessment (E-APCA) approach by calculating code naturalness. On the large-scale dataset PraPatch, E-APCA surpasses traditional methods by over 20% across various evaluation metrics. Compared to the latest APCA method Entropy-delta based on a 6.7B LLM, E-APCA achieves a 17.32% higher correct patch recall and a 6.83% higher F1 score, while the reasoning time is less than 7% of that required by Entropy-delta.</p></div>","PeriodicalId":55414,"journal":{"name":"Automated Software Engineering","volume":"32 2","pages":""},"PeriodicalIF":2.0,"publicationDate":"2025-04-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143761732","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Ladle: a method for unsupervised anomaly detection across log types Ladle:一种跨日志类型的无监督异常检测方法
IF 2 2区 计算机科学
Automated Software Engineering Pub Date : 2025-03-24 DOI: 10.1007/s10515-025-00504-w
Juha Mylläri, Tatu Aalto, Jukka K. Nurminen
{"title":"Ladle: a method for unsupervised anomaly detection across log types","authors":"Juha Mylläri,&nbsp;Tatu Aalto,&nbsp;Jukka K. Nurminen","doi":"10.1007/s10515-025-00504-w","DOIUrl":"10.1007/s10515-025-00504-w","url":null,"abstract":"<div><p>Log files can help detect and diagnose erroneous software behaviour, but their utility is limited by the ability of users and developers to sift through large amounts of text. Unsupervised machine learning tools have been developed to automatically find anomalies in logs, but they are usually not designed for situations where a large number of log streams or log files, each with its own characteristics, need to be analyzed and their anomaly scores compared. We propose Ladle, an accurate unsupervised anomaly detection and localization method that can simultaneously learn the characteristics of hundreds of log types and determine which log entries are the most anomalous across these log types. Ladle uses a sentence transformer (a large language model) to embed short overlapping segments of log files and compares new, potentially anomalous, log segments against a collection of reference data. The result of the comparison is re-centered by subtracting a baseline score indicating how much variation tends to occur in each log type, making anomaly scores comparable across log types. Ladle is designed to adapt to data drift and is updated by adding new reference data without the need to retrain the sentence transformer. We demonstrate the accuracy of Ladle on a real-world dataset consisting of logs produced by an endpoint protection platform test suite. We also compare Ladle’s performance on the dataset to that of a state-of-the-art method for single-log anomaly detection, showing that the latter is inadequate for the multi-log task.</p></div>","PeriodicalId":55414,"journal":{"name":"Automated Software Engineering","volume":"32 2","pages":""},"PeriodicalIF":2.0,"publicationDate":"2025-03-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://link.springer.com/content/pdf/10.1007/s10515-025-00504-w.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143676513","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Requirement falsification for cyber-physical systems using generative models 使用生成模型对网络物理系统进行需求伪造
IF 2 2区 计算机科学
Automated Software Engineering Pub Date : 2025-03-23 DOI: 10.1007/s10515-025-00503-x
Jarkko Peltomäki, Ivan Porres
{"title":"Requirement falsification for cyber-physical systems using generative models","authors":"Jarkko Peltomäki,&nbsp;Ivan Porres","doi":"10.1007/s10515-025-00503-x","DOIUrl":"10.1007/s10515-025-00503-x","url":null,"abstract":"<div><p>We present the OGAN algorithm for automatic requirement falsification of cyber-physical systems. System inputs and outputs are represented as piecewise constant signals over time while requirements are expressed in signal temporal logic. OGAN can find inputs that are counterexamples for the correctness of a system revealing design, software, or hardware defects before the system is taken into operation. The OGAN algorithm works by training a generative machine learning model to produce such counterexamples. It executes tests offline and does not require any previous model of the system under test. We evaluate OGAN using the ARCH-COMP benchmark problems, and the experimental results show that generative models are a viable method for requirement falsification. OGAN can be applied to new systems with little effort, has few requirements for the system under test, and exhibits state-of-the-art CPS falsification efficiency and effectiveness.</p></div>","PeriodicalId":55414,"journal":{"name":"Automated Software Engineering","volume":"32 2","pages":""},"PeriodicalIF":2.0,"publicationDate":"2025-03-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://link.springer.com/content/pdf/10.1007/s10515-025-00503-x.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143676544","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Tab: template-aware bug report title generation via two-phase fine-tuned models 选项卡:通过两阶段微调模型生成模板感知的bug报告标题
IF 2 2区 计算机科学
Automated Software Engineering Pub Date : 2025-03-22 DOI: 10.1007/s10515-025-00505-9
Xiao Liu, Yinkang Xu, Weifeng Sun, Naiqi Huang, Song Sun, Qiang Li, Dan Yang, Meng Yan
{"title":"Tab: template-aware bug report title generation via two-phase fine-tuned models","authors":"Xiao Liu,&nbsp;Yinkang Xu,&nbsp;Weifeng Sun,&nbsp;Naiqi Huang,&nbsp;Song Sun,&nbsp;Qiang Li,&nbsp;Dan Yang,&nbsp;Meng Yan","doi":"10.1007/s10515-025-00505-9","DOIUrl":"10.1007/s10515-025-00505-9","url":null,"abstract":"<div><p>Bug reports play a critical role in the software development lifecycle by helping developers identify and resolve defects efficiently. However, the quality of bug report titles, particularly in open-source communities, can vary significantly, which complicates the bug triage and resolution processes. Existing approaches, such as iTAPE, treat title generation as a one-sentence summarization task using sequence-to-sequence models. While these methods show promise, they face two major limitations: (1) they do not consider the distinct components of bug reports, treating the entire report as a homogeneous input, and (2) they struggle to handle the variability between template-based and non-template-based reports, often resulting in suboptimal titles. To address these limitations, we propose <span>TAB</span>, a hybrid framework that combines a <i>Document Component Analyzer</i> based on a pre-trained BERT model and a <i>Title Generation Model</i> based on CodeT5. <span>TAB</span> addresses the first limitation by segmenting bug reports into four components-<i>Description</i>, <i>Reproduction</i>, <i>Expected Behavior</i>, and <i>Others</i>-to ensure better alignment between input and output. For the second limitation, <span>TAB</span> uses a divergent approach: for template-based reports, titles are generated directly, while for non-template reports, DCA extracts key components to improve title relevance and clarity. We evaluate <span>TAB</span> on both template-based and non-template-based bug reports, demonstrating that it significantly outperforms existing methods. Specifically, <span>TAB</span> achieves average improvements of 170.4–389.5% in METEOR, 67.8–190.0% in ROUGE-L, and 65.7–124.5% in chrF(AF) compared to baseline approaches on template-based reports. Additionally, on non-template-based reports, <span>TAB</span> shows an average improvement of 64% in METEOR, 3.6% in ROUGE-L, and 14.8% in chrF(AF) over the state-of-the-art. These results confirm the robustness of <span>TAB</span> in generating high-quality titles across diverse bug report formats.</p></div>","PeriodicalId":55414,"journal":{"name":"Automated Software Engineering","volume":"32 2","pages":""},"PeriodicalIF":2.0,"publicationDate":"2025-03-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143668190","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Reinforcement learning for mutation operator selection in automated program repair 自动程序修复中突变算子选择的强化学习
IF 2 2区 计算机科学
Automated Software Engineering Pub Date : 2025-03-15 DOI: 10.1007/s10515-025-00501-z
Carol Hanna, Aymeric Blot, Justyna Petke
{"title":"Reinforcement learning for mutation operator selection in automated program repair","authors":"Carol Hanna,&nbsp;Aymeric Blot,&nbsp;Justyna Petke","doi":"10.1007/s10515-025-00501-z","DOIUrl":"10.1007/s10515-025-00501-z","url":null,"abstract":"<div><p>Automated program repair techniques aim to aid software developers with the challenging task of fixing bugs. In heuristic-based program repair, a search space of mutated program variants is explored to find potential patches for bugs. Most commonly, every selection of a mutation operator during search is performed uniformly at random, which can generate many buggy, even uncompilable programs. Our goal is to reduce the generation of variants that do not compile or break intended functionality which waste considerable resources. In this paper, we investigate the feasibility of a reinforcement learning-based approach for the selection of mutation operators in heuristic-based program repair. Our proposed approach is programming language, granularity-level, and search strategy agnostic and allows for easy augmentation into existing heuristic-based repair tools. We conducted an extensive empirical evaluation of four operator selection techniques, two reward types, two credit assignment strategies, two integration methods, and three sets of mutation operators using 30,080 independent repair attempts. We evaluated our approach on 353 real-world bugs from the Defects4J benchmark. The reinforcement learning-based mutation operator selection results in a higher number of test-passing variants, but does not exhibit a noticeable improvement in the number of bugs patched in comparison with the baseline, uniform random selection. While reinforcement learning has been previously shown to be successful in improving the search of evolutionary algorithms, often used in heuristic-based program repair, it has yet to demonstrate such improvements when applied to this area of research.</p></div>","PeriodicalId":55414,"journal":{"name":"Automated Software Engineering","volume":"32 2","pages":""},"PeriodicalIF":2.0,"publicationDate":"2025-03-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://link.springer.com/content/pdf/10.1007/s10515-025-00501-z.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143629687","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信