Evaluating the Impact of Experimental Assumptions in Automated Fault Localization

2023 IEEE/ACM 45th International Conference on Software Engineering (ICSE) Pub Date : 2023-05-01 DOI:10.1109/ICSE48619.2023.00025

E. Soremekun, Lukas Kirschner, Marcel Böhme, Mike Papadakis

{"title":"Evaluating the Impact of Experimental Assumptions in Automated Fault Localization","authors":"E. Soremekun, Lukas Kirschner, Marcel Böhme, Mike Papadakis","doi":"10.1109/ICSE48619.2023.00025","DOIUrl":null,"url":null,"abstract":"Much research on automated program debugging often assumes that bug fix location(s) indicate the faults' root causes and that root causes of faults lie within single code elements (statements). It is also often assumed that the number of statements a developer would need to inspect before finding the first faulty statement reflects debugging effort. Although intuitive, these three assumptions are typically used (55% of experiments in surveyed publications make at least one of these three assumptions) without any consideration of their effects on the debugger's effectiveness and potential impact on developers in practice. To deal with this issue, we perform controlled experimentation, split testing in particular, using 352 bugs from 46 open-source C programs, 19 Automated Fault Localization (AFL) techniques (18 statistical debugging formulas and dynamic slicing), two (2) state-of-the-art automated program repair (APR) techniques (GenProg and Angelix) and 76 professional developers. Our results show that these assumptions conceal the difficulty of debugging. They make AFL techniques appear to be (up to 38%) more effective, and make APR tools appear to be (2X) less effective. We also find that most developers (83%) consider these assumptions to be unsuitable for debuggers and, perhaps worse, that they may inhibit development productivity. The majority (66%) of developers prefer debugging diagnoses without these assumptions twice as much as with the assumptions. Our findings motivate the need to assess debuggers conservatively, i.e., without these assumptions.","PeriodicalId":376379,"journal":{"name":"2023 IEEE/ACM 45th International Conference on Software Engineering (ICSE)","volume":"7 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2023 IEEE/ACM 45th International Conference on Software Engineering (ICSE)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICSE48619.2023.00025","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

Much research on automated program debugging often assumes that bug fix location(s) indicate the faults' root causes and that root causes of faults lie within single code elements (statements). It is also often assumed that the number of statements a developer would need to inspect before finding the first faulty statement reflects debugging effort. Although intuitive, these three assumptions are typically used (55% of experiments in surveyed publications make at least one of these three assumptions) without any consideration of their effects on the debugger's effectiveness and potential impact on developers in practice. To deal with this issue, we perform controlled experimentation, split testing in particular, using 352 bugs from 46 open-source C programs, 19 Automated Fault Localization (AFL) techniques (18 statistical debugging formulas and dynamic slicing), two (2) state-of-the-art automated program repair (APR) techniques (GenProg and Angelix) and 76 professional developers. Our results show that these assumptions conceal the difficulty of debugging. They make AFL techniques appear to be (up to 38%) more effective, and make APR tools appear to be (2X) less effective. We also find that most developers (83%) consider these assumptions to be unsuitable for debuggers and, perhaps worse, that they may inhibit development productivity. The majority (66%) of developers prefer debugging diagnoses without these assumptions twice as much as with the assumptions. Our findings motivate the need to assess debuggers conservatively, i.e., without these assumptions.

查看原文本刊更多论文

评估实验假设对自动故障定位的影响

许多关于自动程序调试的研究通常假设错误修复位置表明了错误的根本原因，并且错误的根本原因存在于单个代码元素(语句)中。通常还假设开发人员在找到第一个错误语句之前需要检查的语句数量反映了调试工作。虽然很直观，但通常使用这三个假设(调查出版物中55%的实验至少使用了这三个假设中的一个)，而没有考虑它们对调试器有效性的影响以及在实践中对开发人员的潜在影响。为了解决这个问题，我们进行了控制实验，特别是分割测试，使用了46个开源C程序中的352个bug, 19个自动故障定位(AFL)技术(18个统计调试公式和动态切片)，两(2)个最先进的自动程序修复(APR)技术(GenProg和Angelix)和76名专业开发人员。我们的结果表明，这些假设掩盖了调试的困难。它们使AFL技术看起来(高达38%)更有效，而使APR工具看起来(2X)更不有效。我们还发现，大多数开发人员(83%)认为这些假设不适合调试器，也许更糟的是，它们可能会抑制开发效率。大多数(66%)开发人员更喜欢不带这些假设的调试诊断，这是带假设的两倍。我们的发现激发了保守地评估调试器的需要，也就是说，没有这些假设。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2023 IEEE/ACM 45th International Conference on Software Engineering (ICSE)

自引率

0.00%

发文量