Fault Localization via Efficient Probabilistic Modeling of Program Semantics

2022 IEEE/ACM 44th International Conference on Software Engineering (ICSE) Pub Date : 2022-05-01 DOI:10.1145/3510003.3510073

Muhan Zeng, Yiqian Wu, Zhen Ye, Yingfei Xiong, Xin Zhang, Lu Zhang

{"title":"Fault Localization via Efficient Probabilistic Modeling of Program Semantics","authors":"Muhan Zeng, Yiqian Wu, Zhen Ye, Yingfei Xiong, Xin Zhang, Lu Zhang","doi":"10.1145/3510003.3510073","DOIUrl":null,"url":null,"abstract":"Testing-based fault localization has been a significant topic in software engineering in the past decades. It localizes a faulty program element based on a set of passing and failing test executions. Since whether a fault could be triggered and detected by a test is related to program semantics, it is crucial to model program semantics in fault localization approaches. Existing approaches either consider the full semantics of the program (e.g., mutation-based fault localization and angelic debugging), leading to scalability issues, or ignore the semantics of the program (e.g., spectrum-based fault localization), leading to imprecise localization results. Our key idea is: by modeling only the correctness of program values but not their full semantics, a balance could be reached between effectiveness and scalability. To realize this idea, we introduce a probabilistic approach to model program semantics and utilize information from static analysis and dynamic execution traces in our modeling. Our approach, SmartFL (SeMantics bAsed pRobabilisTic Fault Localization), is evaluated on a real-world dataset, Defects4J. The top-1 statement-level accuracy of our approach is 21 %, which is the best among state-of-the-art methods. The average time cost is 210 seconds per fault while existing methods that capture full semantics are often 10x or more slower.","PeriodicalId":202896,"journal":{"name":"2022 IEEE/ACM 44th International Conference on Software Engineering (ICSE)","volume":"6 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"7","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 IEEE/ACM 44th International Conference on Software Engineering (ICSE)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3510003.3510073","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 7

Abstract

Testing-based fault localization has been a significant topic in software engineering in the past decades. It localizes a faulty program element based on a set of passing and failing test executions. Since whether a fault could be triggered and detected by a test is related to program semantics, it is crucial to model program semantics in fault localization approaches. Existing approaches either consider the full semantics of the program (e.g., mutation-based fault localization and angelic debugging), leading to scalability issues, or ignore the semantics of the program (e.g., spectrum-based fault localization), leading to imprecise localization results. Our key idea is: by modeling only the correctness of program values but not their full semantics, a balance could be reached between effectiveness and scalability. To realize this idea, we introduce a probabilistic approach to model program semantics and utilize information from static analysis and dynamic execution traces in our modeling. Our approach, SmartFL (SeMantics bAsed pRobabilisTic Fault Localization), is evaluated on a real-world dataset, Defects4J. The top-1 statement-level accuracy of our approach is 21 %, which is the best among state-of-the-art methods. The average time cost is 210 seconds per fault while existing methods that capture full semantics are often 10x or more slower.

查看原文本刊更多论文

基于程序语义高效概率建模的故障定位

在过去的几十年里，基于测试的故障定位一直是软件工程中的一个重要课题。它根据一组通过和失败的测试执行来定位错误的程序元素。由于测试是否可以触发和检测到故障与程序语义有关，因此在故障定位方法中对程序语义进行建模至关重要。现有的方法要么考虑程序的完整语义(例如，基于突变的故障定位和天使调试)，导致可伸缩性问题，要么忽略程序的语义(例如，基于频谱的故障定位)，导致不精确的定位结果。我们的关键思想是:通过只对程序值的正确性建模，而不对其完整的语义建模，可以在有效性和可伸缩性之间达到平衡。为了实现这一思想，我们引入了一种概率方法来建模程序语义，并在建模中利用来自静态分析和动态执行跟踪的信息。我们的方法，SmartFL(基于语义的概率故障定位)，在一个真实的数据集缺陷4j上进行了评估。我们的方法的顶级语句级准确率为21%，在最先进的方法中是最好的。每个故障的平均时间成本为210秒，而捕获完整语义的现有方法通常要慢10倍甚至更多。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2022 IEEE/ACM 44th International Conference on Software Engineering (ICSE)

自引率

0.00%

发文量