Hengyuan Liu , Zheng Li , Xiaolan Kang , Shumei Wu , Doyle Paul , Xiang Chen , Yong Liu
{"title":"研究范围:基于高阶突变的故障定位的混合优化策略","authors":"Hengyuan Liu , Zheng Li , Xiaolan Kang , Shumei Wu , Doyle Paul , Xiang Chen , Yong Liu","doi":"10.1016/j.infsof.2025.107873","DOIUrl":null,"url":null,"abstract":"<div><h3>Context:</h3><div>Mutation-Based Fault Localization (MBFL) using Higher-Order Mutants (HOM) has achieved promising performance in multiple-fault programs by simulating more realistic faults. Despite its effectiveness, it can be extremely costly due to the execution of numerous HOMs. However, existing cost-optimization strategies mainly focus on first-order mutants (FOMs), without considering the dependency relationships between HOMs and multiple program entities.</div></div><div><h3>Objective:</h3><div>In this article, we propose a novel strategy called <em>S</em>mart <em>C</em>ost-<em>O</em>ptimization through dynamic <em>P</em>rediction and sampling <em>E</em>xecution (SCOPE). It aims to reduce costs while providing rich mutation analysis information.</div></div><div><h3>Methods:</h3><div>SCOPE contains two key components: a Smart HOM Sampler and a Mutant-Testing Predictor. The former pre-selects the most promising HOMs for each program entity to execute, based on their association with suspicious program entities. The latter employs machine learning to infer the impact of the remaining HOMs on tests using test execution data from selected HOMs, without the need for actual execution.</div></div><div><h3>Results:</h3><div>(1) SCOPE outperforms state-of-the-art optimization strategies, including SELECTIVE, SAMPLING, and PMT, regardless of sampling rate or MBFL formulas adopted. (2) SCOPE can reduce the number of involved HOMs by up to 90% without any loss in the performance of MBFL. (3) SCOPE outperforms baseline methods including SBFL, three optimized MBFL techniques (WSOME, SGS, HMBFL) and two deep learning-based fault localization techniques (CNNFL and RNNFL). (4) Ablation Experiment validates that the Smart HOM Sampler and the Mutant-Testing Predictor contribute positively to the effectiveness of SCOPE, with average improvements of 23.60% and 15.14% in <span><math><mrow><mi>T</mi><mi>O</mi><mi>P</mi></mrow></math></span>-1 and <span><math><mi>A</mi></math></span>-<span><math><mrow><mi>E</mi><mi>X</mi><mi>A</mi><mi>M</mi></mrow></math></span>. Additionally, machine learning model comparison for the Mutant-Testing Predictor reveals that compared to the Logistic Regression and Naive Bayes, Random Forest has better prediction performance.</div></div><div><h3>Conclusions:</h3><div>Evaluation on 135 real-world multiple-fault programs from the widely used benchmark Defects4J have shown the effectiveness of our proposed hybrid optimization strategy SCOPE for higher-order mutation-based fault localization.</div></div>","PeriodicalId":54983,"journal":{"name":"Information and Software Technology","volume":"188 ","pages":"Article 107873"},"PeriodicalIF":4.3000,"publicationDate":"2025-08-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"SCOPE: Hybrid optimization strategy for higher-order mutation-based fault localization\",\"authors\":\"Hengyuan Liu , Zheng Li , Xiaolan Kang , Shumei Wu , Doyle Paul , Xiang Chen , Yong Liu\",\"doi\":\"10.1016/j.infsof.2025.107873\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><h3>Context:</h3><div>Mutation-Based Fault Localization (MBFL) using Higher-Order Mutants (HOM) has achieved promising performance in multiple-fault programs by simulating more realistic faults. Despite its effectiveness, it can be extremely costly due to the execution of numerous HOMs. However, existing cost-optimization strategies mainly focus on first-order mutants (FOMs), without considering the dependency relationships between HOMs and multiple program entities.</div></div><div><h3>Objective:</h3><div>In this article, we propose a novel strategy called <em>S</em>mart <em>C</em>ost-<em>O</em>ptimization through dynamic <em>P</em>rediction and sampling <em>E</em>xecution (SCOPE). It aims to reduce costs while providing rich mutation analysis information.</div></div><div><h3>Methods:</h3><div>SCOPE contains two key components: a Smart HOM Sampler and a Mutant-Testing Predictor. The former pre-selects the most promising HOMs for each program entity to execute, based on their association with suspicious program entities. The latter employs machine learning to infer the impact of the remaining HOMs on tests using test execution data from selected HOMs, without the need for actual execution.</div></div><div><h3>Results:</h3><div>(1) SCOPE outperforms state-of-the-art optimization strategies, including SELECTIVE, SAMPLING, and PMT, regardless of sampling rate or MBFL formulas adopted. (2) SCOPE can reduce the number of involved HOMs by up to 90% without any loss in the performance of MBFL. (3) SCOPE outperforms baseline methods including SBFL, three optimized MBFL techniques (WSOME, SGS, HMBFL) and two deep learning-based fault localization techniques (CNNFL and RNNFL). (4) Ablation Experiment validates that the Smart HOM Sampler and the Mutant-Testing Predictor contribute positively to the effectiveness of SCOPE, with average improvements of 23.60% and 15.14% in <span><math><mrow><mi>T</mi><mi>O</mi><mi>P</mi></mrow></math></span>-1 and <span><math><mi>A</mi></math></span>-<span><math><mrow><mi>E</mi><mi>X</mi><mi>A</mi><mi>M</mi></mrow></math></span>. Additionally, machine learning model comparison for the Mutant-Testing Predictor reveals that compared to the Logistic Regression and Naive Bayes, Random Forest has better prediction performance.</div></div><div><h3>Conclusions:</h3><div>Evaluation on 135 real-world multiple-fault programs from the widely used benchmark Defects4J have shown the effectiveness of our proposed hybrid optimization strategy SCOPE for higher-order mutation-based fault localization.</div></div>\",\"PeriodicalId\":54983,\"journal\":{\"name\":\"Information and Software Technology\",\"volume\":\"188 \",\"pages\":\"Article 107873\"},\"PeriodicalIF\":4.3000,\"publicationDate\":\"2025-08-20\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Information and Software Technology\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S0950584925002125\",\"RegionNum\":2,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"COMPUTER SCIENCE, INFORMATION SYSTEMS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Information and Software Technology","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0950584925002125","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}
引用次数: 0
摘要
背景:基于突变的故障定位(MBFL)利用高阶突变体(HOM)模拟更真实的故障,在多故障程序中取得了良好的性能。尽管它很有效,但由于执行了大量的HOMs,它可能非常昂贵。然而,现有的成本优化策略主要关注一阶突变体(FOMs),而没有考虑一阶突变体与多个项目实体之间的依赖关系。目的:本文提出了一种基于动态预测和抽样执行的智能成本优化策略(SCOPE)。它旨在降低成本,同时提供丰富的突变分析信息。方法:SCOPE包含两个关键组件:一个智能homer采样器和一个突变检测预测器。前者根据与可疑程序实体的关联,为每个程序实体预先选择最有希望执行的HOMs。后者使用机器学习来推断剩余HOMs对测试的影响,使用来自选定HOMs的测试执行数据,而不需要实际执行。结果:(1)无论采用何种采样率或MBFL公式,SCOPE都优于当前最先进的优化策略(包括SELECTIVE、SAMPLING和PMT)。(2) SCOPE可以在不影响MBFL性能的情况下,将所涉及的HOMs数量减少多达90%。(3) SCOPE优于SBFL、三种优化MBFL技术(WSOME、SGS、HMBFL)和两种基于深度学习的故障定位技术(CNNFL和RNNFL)。(4)消融实验验证了Smart HOM Sampler和Mutant-Testing Predictor对SCOPE的有效性有积极的促进作用,TOP-1和A-EXAM的平均改善率分别为23.60%和15.14%。此外,对突变测试预测器的机器学习模型比较表明,与逻辑回归和朴素贝叶斯相比,随机森林具有更好的预测性能。结论:对来自广泛使用的基准缺陷4j的135个真实多故障程序的评估表明,我们提出的混合优化策略SCOPE对于基于高阶突变的故障定位是有效的。
SCOPE: Hybrid optimization strategy for higher-order mutation-based fault localization
Context:
Mutation-Based Fault Localization (MBFL) using Higher-Order Mutants (HOM) has achieved promising performance in multiple-fault programs by simulating more realistic faults. Despite its effectiveness, it can be extremely costly due to the execution of numerous HOMs. However, existing cost-optimization strategies mainly focus on first-order mutants (FOMs), without considering the dependency relationships between HOMs and multiple program entities.
Objective:
In this article, we propose a novel strategy called Smart Cost-Optimization through dynamic Prediction and sampling Execution (SCOPE). It aims to reduce costs while providing rich mutation analysis information.
Methods:
SCOPE contains two key components: a Smart HOM Sampler and a Mutant-Testing Predictor. The former pre-selects the most promising HOMs for each program entity to execute, based on their association with suspicious program entities. The latter employs machine learning to infer the impact of the remaining HOMs on tests using test execution data from selected HOMs, without the need for actual execution.
Results:
(1) SCOPE outperforms state-of-the-art optimization strategies, including SELECTIVE, SAMPLING, and PMT, regardless of sampling rate or MBFL formulas adopted. (2) SCOPE can reduce the number of involved HOMs by up to 90% without any loss in the performance of MBFL. (3) SCOPE outperforms baseline methods including SBFL, three optimized MBFL techniques (WSOME, SGS, HMBFL) and two deep learning-based fault localization techniques (CNNFL and RNNFL). (4) Ablation Experiment validates that the Smart HOM Sampler and the Mutant-Testing Predictor contribute positively to the effectiveness of SCOPE, with average improvements of 23.60% and 15.14% in -1 and -. Additionally, machine learning model comparison for the Mutant-Testing Predictor reveals that compared to the Logistic Regression and Naive Bayes, Random Forest has better prediction performance.
Conclusions:
Evaluation on 135 real-world multiple-fault programs from the widely used benchmark Defects4J have shown the effectiveness of our proposed hybrid optimization strategy SCOPE for higher-order mutation-based fault localization.
期刊介绍:
Information and Software Technology is the international archival journal focusing on research and experience that contributes to the improvement of software development practices. The journal''s scope includes methods and techniques to better engineer software and manage its development. Articles submitted for review should have a clear component of software engineering or address ways to improve the engineering and management of software development. Areas covered by the journal include:
• Software management, quality and metrics,
• Software processes,
• Software architecture, modelling, specification, design and programming
• Functional and non-functional software requirements
• Software testing and verification & validation
• Empirical studies of all aspects of engineering and managing software development
Short Communications is a new section dedicated to short papers addressing new ideas, controversial opinions, "Negative" results and much more. Read the Guide for authors for more information.
The journal encourages and welcomes submissions of systematic literature studies (reviews and maps) within the scope of the journal. Information and Software Technology is the premiere outlet for systematic literature studies in software engineering.