{"title":"CG-FL: A data augmentation approach using context-aware genetic algorithm for fault localization","authors":"Jian Hu","doi":"10.1016/j.jss.2025.112359","DOIUrl":null,"url":null,"abstract":"<div><div>Fault localization (FL) is a critical step in software debugging. Coverage-based fault localization (CFL) as one of the most promising FL technique utilizes coverage information obtained from program entities executed by test cases to determine the entities that are more likely to be faulty. However, CFL faces two main issues that limit its effectiveness. Firstly, the code coverage data contains numerous irrelevant statements for the observed failure, which makes the search scope too large for FL. Secondly, the input coverage data is highly imbalanced due to the presence of significantly more passing test cases than failing test cases, which makes the FL model bias to the passing test cases. To address these problems, we propose CG-FL, a data augmentation approach using context-aware genetic algorithm. Specifically, CG-FL first uses program slicing to construct a failure context for FL. Subsequently, CG-FL generate synthesized failing test cases through the application of the genetic algorithm. To evaluate the effectiveness of CG-FL, we compared it with six state-of-the-art FL methods and three representative data augmentation methods on 420 versions of 9 benchmarks. The experimental findings clearly indicate that CG-FL substantially enhances the effectiveness of the six FL methods and outperforms the three data augmentation methods.</div></div>","PeriodicalId":51099,"journal":{"name":"Journal of Systems and Software","volume":"222 ","pages":"Article 112359"},"PeriodicalIF":3.7000,"publicationDate":"2025-01-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Systems and Software","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0164121225000275","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, SOFTWARE ENGINEERING","Score":null,"Total":0}
引用次数: 0
Abstract
Fault localization (FL) is a critical step in software debugging. Coverage-based fault localization (CFL) as one of the most promising FL technique utilizes coverage information obtained from program entities executed by test cases to determine the entities that are more likely to be faulty. However, CFL faces two main issues that limit its effectiveness. Firstly, the code coverage data contains numerous irrelevant statements for the observed failure, which makes the search scope too large for FL. Secondly, the input coverage data is highly imbalanced due to the presence of significantly more passing test cases than failing test cases, which makes the FL model bias to the passing test cases. To address these problems, we propose CG-FL, a data augmentation approach using context-aware genetic algorithm. Specifically, CG-FL first uses program slicing to construct a failure context for FL. Subsequently, CG-FL generate synthesized failing test cases through the application of the genetic algorithm. To evaluate the effectiveness of CG-FL, we compared it with six state-of-the-art FL methods and three representative data augmentation methods on 420 versions of 9 benchmarks. The experimental findings clearly indicate that CG-FL substantially enhances the effectiveness of the six FL methods and outperforms the three data augmentation methods.
期刊介绍:
The Journal of Systems and Software publishes papers covering all aspects of software engineering and related hardware-software-systems issues. All articles should include a validation of the idea presented, e.g. through case studies, experiments, or systematic comparisons with other approaches already in practice. Topics of interest include, but are not limited to:
•Methods and tools for, and empirical studies on, software requirements, design, architecture, verification and validation, maintenance and evolution
•Agile, model-driven, service-oriented, open source and global software development
•Approaches for mobile, multiprocessing, real-time, distributed, cloud-based, dependable and virtualized systems
•Human factors and management concerns of software development
•Data management and big data issues of software systems
•Metrics and evaluation, data mining of software development resources
•Business and economic aspects of software development processes
The journal welcomes state-of-the-art surveys and reports of practical experience for all of these topics.