HetFL: Heterogeneous Graph-Based Software Fault Localization

IF 6.5 1区计算机科学 Q1 COMPUTER SCIENCE, SOFTWARE ENGINEERING

IEEE Transactions on Software Engineering Pub Date : 2024-09-05 DOI:10.1109/TSE.2024.3454605

Xin Chen;Tian Sun;Dongling Zhuang;Dongjin Yu;He Jiang;Zhide Zhou;Sicheng Li

{"title":"HetFL: Heterogeneous Graph-Based Software Fault Localization","authors":"Xin Chen;Tian Sun;Dongling Zhuang;Dongjin Yu;He Jiang;Zhide Zhou;Sicheng Li","doi":"10.1109/TSE.2024.3454605","DOIUrl":null,"url":null,"abstract":"Automated software fault localization has become one of the hot spots on which researchers have focused in recent years. Existing studies have shown that learning-based techniques can effectively localize faults leveraging various information. However, there exist two problems in these techniques. The first is that they simply represent various information without caring the contribution of different information. The second is that the data imbalance problem is not considered in these techniques. Thus, their effectiveness is limited in practice. In this paper, we propose HetFL, a novel heterogeneous graph-based software fault localization technique to aggregate different information into a heterogeneous graph in which program entities and test cases are regarded as nodes, and coverage, change histories, and call relationships are viewed as edges. HetFL first extracts textual and structure information from source code as attributes of nodes and integrates them to form an attribute vector. Then, for a given node, HetFL finds its neighbor nodes based on the types of edges and aggregates corresponding neighbor nodes to form type vectors. After that, the attribute vector and all the type vectors of each node are aggregated to generate the final vector representation by an attention mechanism. Finally, we leverage a convolution neural network (CNN) to obtain the suspicious score of each method. To validate the effectiveness of HetFL, experiments are conducted on the widely used dataset Defects4J (v1.2.0). The experimental results show that HetFL can localize 217 faults within Top-1 that is 25 higher than the state-of-the-art technique DeepFL, and achieve 6.37 and 5.58 in terms of MAR and MFR which improve DeepFL by 9.0% and 5.6%, respectively. In addition, we also perform experiments on the latest version of Defects4J (v2.0.0). The experimental results show that HetFL has better performance than the baseline methods.","PeriodicalId":13324,"journal":{"name":"IEEE Transactions on Software Engineering","volume":"50 11","pages":"2884-2905"},"PeriodicalIF":6.5000,"publicationDate":"2024-09-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Software Engineering","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/10666908/","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, SOFTWARE ENGINEERING","Score":null,"Total":0}

引用次数: 0

Abstract

Automated software fault localization has become one of the hot spots on which researchers have focused in recent years. Existing studies have shown that learning-based techniques can effectively localize faults leveraging various information. However, there exist two problems in these techniques. The first is that they simply represent various information without caring the contribution of different information. The second is that the data imbalance problem is not considered in these techniques. Thus, their effectiveness is limited in practice. In this paper, we propose HetFL, a novel heterogeneous graph-based software fault localization technique to aggregate different information into a heterogeneous graph in which program entities and test cases are regarded as nodes, and coverage, change histories, and call relationships are viewed as edges. HetFL first extracts textual and structure information from source code as attributes of nodes and integrates them to form an attribute vector. Then, for a given node, HetFL finds its neighbor nodes based on the types of edges and aggregates corresponding neighbor nodes to form type vectors. After that, the attribute vector and all the type vectors of each node are aggregated to generate the final vector representation by an attention mechanism. Finally, we leverage a convolution neural network (CNN) to obtain the suspicious score of each method. To validate the effectiveness of HetFL, experiments are conducted on the widely used dataset Defects4J (v1.2.0). The experimental results show that HetFL can localize 217 faults within Top-1 that is 25 higher than the state-of-the-art technique DeepFL, and achieve 6.37 and 5.58 in terms of MAR and MFR which improve DeepFL by 9.0% and 5.6%, respectively. In addition, we also perform experiments on the latest version of Defects4J (v2.0.0). The experimental results show that HetFL has better performance than the baseline methods.

查看原文本刊更多论文

HetFL：基于异构图的软件故障定位

软件故障自动定位已成为近年来研究人员关注的热点之一。现有研究表明，基于学习的技术可以利用各种信息有效定位故障。然而，这些技术存在两个问题。一是它们只是简单地表示各种信息，而没有考虑不同信息的贡献。第二，这些技术没有考虑数据不平衡问题。因此，它们在实践中的效果有限。在本文中，我们提出了一种基于异构图的新型软件故障定位技术 HetFL，它将不同的信息聚合到一个异构图中，其中程序实体和测试用例被视为节点，覆盖率、变更历史和调用关系被视为边。HetFL 首先从源代码中提取文本和结构信息作为节点的属性，并将其整合形成属性向量。然后，对于给定的节点，HetFL 会根据边的类型找到它的邻居节点，并将相应的邻居节点聚合起来形成类型向量。然后，每个节点的属性向量和所有类型向量通过注意力机制进行聚合，生成最终的向量表示。最后，我们利用卷积神经网络（CNN）来获得每种方法的可疑得分。为了验证 HetFL 的有效性，我们在广泛使用的数据集 Defects4J（v1.2.0）上进行了实验。实验结果表明，HetFL 可以在 Top-1 范围内定位 217 个故障，比最先进的 DeepFL 高出 25 个，在 MAR 和 MFR 方面分别达到 6.37 和 5.58，比 DeepFL 分别提高了 9.0% 和 5.6%。此外，我们还在最新版本的 Defects4J（v2.0.0）上进行了实验。实验结果表明，HetFL 比基线方法具有更好的性能。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

IEEE Transactions on Software Engineering 工程技术-工程：电子与电气

CiteScore

9.70

自引率

10.80%

发文量

724

审稿时长

6 months

期刊介绍： IEEE Transactions on Software Engineering seeks contributions comprising well-defined theoretical results and empirical studies with potential impacts on software construction, analysis, or management. The scope of this Transactions extends from fundamental mechanisms to the development of principles and their application in specific environments. Specific topic areas include: a) Development and maintenance methods and models: Techniques and principles for specifying, designing, and implementing software systems, encompassing notations and process models. b) Assessment methods: Software tests, validation, reliability models, test and diagnosis procedures, software redundancy, design for error control, and measurements and evaluation of process and product aspects. c) Software project management: Productivity factors, cost models, schedule and organizational issues, and standards. d) Tools and environments: Specific tools, integrated tool environments, associated architectures, databases, and parallel and distributed processing issues. e) System issues: Hardware-software trade-offs. f) State-of-the-art surveys: Syntheses and comprehensive reviews of the historical development within specific areas of interest.