CG-FL: A data augmentation approach using context-aware genetic algorithm for fault localization

IF 3.7 2区计算机科学 Q1 COMPUTER SCIENCE, SOFTWARE ENGINEERING

Journal of Systems and Software Pub Date : 2025-01-29 DOI:10.1016/j.jss.2025.112359

Jian Hu

{"title":"CG-FL: A data augmentation approach using context-aware genetic algorithm for fault localization","authors":"Jian Hu","doi":"10.1016/j.jss.2025.112359","DOIUrl":null,"url":null,"abstract":"<div><div>Fault localization (FL) is a critical step in software debugging. Coverage-based fault localization (CFL) as one of the most promising FL technique utilizes coverage information obtained from program entities executed by test cases to determine the entities that are more likely to be faulty. However, CFL faces two main issues that limit its effectiveness. Firstly, the code coverage data contains numerous irrelevant statements for the observed failure, which makes the search scope too large for FL. Secondly, the input coverage data is highly imbalanced due to the presence of significantly more passing test cases than failing test cases, which makes the FL model bias to the passing test cases. To address these problems, we propose CG-FL, a data augmentation approach using context-aware genetic algorithm. Specifically, CG-FL first uses program slicing to construct a failure context for FL. Subsequently, CG-FL generate synthesized failing test cases through the application of the genetic algorithm. To evaluate the effectiveness of CG-FL, we compared it with six state-of-the-art FL methods and three representative data augmentation methods on 420 versions of 9 benchmarks. The experimental findings clearly indicate that CG-FL substantially enhances the effectiveness of the six FL methods and outperforms the three data augmentation methods.</div></div>","PeriodicalId":51099,"journal":{"name":"Journal of Systems and Software","volume":"222 ","pages":"Article 112359"},"PeriodicalIF":3.7000,"publicationDate":"2025-01-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Systems and Software","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0164121225000275","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, SOFTWARE ENGINEERING","Score":null,"Total":0}

引用次数: 0

Abstract

Fault localization (FL) is a critical step in software debugging. Coverage-based fault localization (CFL) as one of the most promising FL technique utilizes coverage information obtained from program entities executed by test cases to determine the entities that are more likely to be faulty. However, CFL faces two main issues that limit its effectiveness. Firstly, the code coverage data contains numerous irrelevant statements for the observed failure, which makes the search scope too large for FL. Secondly, the input coverage data is highly imbalanced due to the presence of significantly more passing test cases than failing test cases, which makes the FL model bias to the passing test cases. To address these problems, we propose CG-FL, a data augmentation approach using context-aware genetic algorithm. Specifically, CG-FL first uses program slicing to construct a failure context for FL. Subsequently, CG-FL generate synthesized failing test cases through the application of the genetic algorithm. To evaluate the effectiveness of CG-FL, we compared it with six state-of-the-art FL methods and three representative data augmentation methods on 420 versions of 9 benchmarks. The experimental findings clearly indicate that CG-FL substantially enhances the effectiveness of the six FL methods and outperforms the three data augmentation methods.

查看原文本刊更多论文

求助全文

约1分钟内获得全文求助全文

来源期刊

Journal of Systems and Software 工程技术-计算机：理论方法

CiteScore

8.60

自引率

5.70%

发文量

193

审稿时长

16 weeks

期刊介绍： The Journal of Systems and Software publishes papers covering all aspects of software engineering and related hardware-software-systems issues. All articles should include a validation of the idea presented, e.g. through case studies, experiments, or systematic comparisons with other approaches already in practice. Topics of interest include, but are not limited to: •Methods and tools for, and empirical studies on, software requirements, design, architecture, verification and validation, maintenance and evolution •Agile, model-driven, service-oriented, open source and global software development •Approaches for mobile, multiprocessing, real-time, distributed, cloud-based, dependable and virtualized systems •Human factors and management concerns of software development •Data management and big data issues of software systems •Metrics and evaluation, data mining of software development resources •Business and economic aspects of software development processes The journal welcomes state-of-the-art surveys and reports of practical experience for all of these topics.