Semantic context based coincidental correct test cases detection for fault localization

IF 2 2区计算机科学 Q3 COMPUTER SCIENCE, SOFTWARE ENGINEERING

Automated Software Engineering Pub Date : 2024-08-18 DOI:10.1007/s10515-024-00466-5

Jian Hu

{"title":"Semantic context based coincidental correct test cases detection for fault localization","authors":"Jian Hu","doi":"10.1007/s10515-024-00466-5","DOIUrl":null,"url":null,"abstract":"<div><p>Fault localization is a process that aims to identify the potentially faulty statements responsible for program failures by analyzing runtime information. Therefore, the input code coverage matrix plays a crucial role in FL. However, the effectiveness of fault localization is compromised by the presence of coincidental correct test cases (CCTC) in the coverage matrix. These CCTC execute faulty code but do not result in program failures. To address this issue, many existing methods focus on identifying CCTC through cluster analysis. However, these methods have three problems. Firstly, identifying the optimal cluster count poses a considerable challenge in CCTC detection. Secondly, the effectiveness of CCTC detection is heavily influenced by the initial centroid selection. Thirdly, the presence of abundant fault-irrelevant statements within the raw coverage matrix introduces substantial noise for CCTC detection. To overcome these challenges, we propose SCD4FL: a semantic context-based CCTC detection method to enhance the coverage matrix for fault localization. SCD4FL incorporates and implements two key ideas: (1) SCD4FL uses the intersection of execution slices to construct a semantic context from the raw coverage matrix, effectively reducing noise during CCTC detection. (2) SCD4FL employs an expert-knowledge-based K-nearest neighbors (KNN) algorithm to detect the CCTC, effectively eliminating the requirement of determining the cluster number and initial centroid. To evaluate the effectiveness of SCD4FL, we conducted extensive experiments on 420 faulty versions of nine benchmarks using six state-of-the-art fault localization methods and two representative CCTC detection methods. The experimental results validate the effectiveness of our method in enhancing the performance of the six fault localization methods and two CCTC detection methods, e.g., the RNN method can be improved by 53.09% under the MFR metric.</p></div>","PeriodicalId":55414,"journal":{"name":"Automated Software Engineering","volume":"31 2","pages":""},"PeriodicalIF":2.0000,"publicationDate":"2024-08-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Automated Software Engineering","FirstCategoryId":"94","ListUrlMain":"https://link.springer.com/article/10.1007/s10515-024-00466-5","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"COMPUTER SCIENCE, SOFTWARE ENGINEERING","Score":null,"Total":0}

引用次数: 0

Abstract

Fault localization is a process that aims to identify the potentially faulty statements responsible for program failures by analyzing runtime information. Therefore, the input code coverage matrix plays a crucial role in FL. However, the effectiveness of fault localization is compromised by the presence of coincidental correct test cases (CCTC) in the coverage matrix. These CCTC execute faulty code but do not result in program failures. To address this issue, many existing methods focus on identifying CCTC through cluster analysis. However, these methods have three problems. Firstly, identifying the optimal cluster count poses a considerable challenge in CCTC detection. Secondly, the effectiveness of CCTC detection is heavily influenced by the initial centroid selection. Thirdly, the presence of abundant fault-irrelevant statements within the raw coverage matrix introduces substantial noise for CCTC detection. To overcome these challenges, we propose SCD4FL: a semantic context-based CCTC detection method to enhance the coverage matrix for fault localization. SCD4FL incorporates and implements two key ideas: (1) SCD4FL uses the intersection of execution slices to construct a semantic context from the raw coverage matrix, effectively reducing noise during CCTC detection. (2) SCD4FL employs an expert-knowledge-based K-nearest neighbors (KNN) algorithm to detect the CCTC, effectively eliminating the requirement of determining the cluster number and initial centroid. To evaluate the effectiveness of SCD4FL, we conducted extensive experiments on 420 faulty versions of nine benchmarks using six state-of-the-art fault localization methods and two representative CCTC detection methods. The experimental results validate the effectiveness of our method in enhancing the performance of the six fault localization methods and two CCTC detection methods, e.g., the RNN method can be improved by 53.09% under the MFR metric.

Abstract Image

查看原文本刊更多论文

基于语义上下文的故障定位重合正确测试用例检测

故障定位的目的是通过分析运行时信息，找出造成程序故障的潜在错误语句。因此，输入代码覆盖矩阵在 FL 中起着至关重要的作用。然而，覆盖矩阵中存在的巧合正确测试用例（CCTC）会影响故障定位的效果。这些 CCTC 会执行有问题的代码，但不会导致程序故障。为解决这一问题，许多现有方法都侧重于通过聚类分析来识别 CCTC。然而，这些方法存在三个问题。首先，在 CCTC 检测中，确定最佳聚类数是一个相当大的挑战。其次，CCTC 检测的有效性在很大程度上受初始中心点选择的影响。第三，原始覆盖矩阵中存在大量与故障无关的语句，这为 CCTC 检测带来了大量噪声。为了克服这些挑战，我们提出了 SCD4FL：一种基于语义上下文的 CCTC 检测方法，用于增强故障定位的覆盖矩阵。SCD4FL 融合并实现了两个关键理念：(1) SCD4FL 利用执行片段的交集从原始覆盖矩阵中构建语义上下文，从而有效降低 CCTC 检测过程中的噪声。(2) SCD4FL 采用基于专家知识的 K-nearest neighbors (KNN) 算法来检测 CCTC，从而有效消除了确定聚类数和初始中心点的要求。为了评估 SCD4FL 的有效性，我们使用六种最先进的故障定位方法和两种有代表性的 CCTC 检测方法，对九种基准的 420 个故障版本进行了大量实验。实验结果验证了我们的方法在提高六种故障定位方法和两种 CCTC 检测方法性能方面的有效性，例如，在 MFR 指标下，RNN 方法的性能提高了 53.09%。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Automated Software Engineering 工程技术-计算机：软件工程

CiteScore

4.80

自引率

11.80%

发文量

审稿时长

>12 weeks

期刊介绍： This journal details research, tutorial papers, survey and accounts of significant industrial experience in the foundations, techniques, tools and applications of automated software engineering technology. This includes the study of techniques for constructing, understanding, adapting, and modeling software artifacts and processes. Coverage in Automated Software Engineering examines both automatic systems and collaborative systems as well as computational models of human software engineering activities. In addition, it presents knowledge representations and artificial intelligence techniques applicable to automated software engineering, and formal techniques that support or provide theoretical foundations. The journal also includes reviews of books, software, conferences and workshops.