On the Influence of Biases in Bug Localization: Evaluation and Benchmark

2022 IEEE International Conference on Software Analysis, Evolution and Reengineering (SANER) Pub Date : 2022-03-01 DOI:10.1109/saner53432.2022.00027

Ratnadira Widyasari, S. A. Haryono, Ferdian Thung, Jieke Shi, Constance Tan, Fiona Wee, Jack Phan, David Lo

{"title":"On the Influence of Biases in Bug Localization: Evaluation and Benchmark","authors":"Ratnadira Widyasari, S. A. Haryono, Ferdian Thung, Jieke Shi, Constance Tan, Fiona Wee, Jack Phan, David Lo","doi":"10.1109/saner53432.2022.00027","DOIUrl":null,"url":null,"abstract":"Bug localization is the task of identifying parts of the source code that needs to be changed to resolve a bug report. As this task is difficult, automatic bug localization tools have been proposed. The development and evaluation of these tools rely on the availability of high-quality bug report datasets. In 2014, Kochhar et al. identified three biases in datasets used to evaluate bug localization techniques: (1) misclassified bug report, (2) already localized bug report, and (3) incorrect ground truth file in a bug report. They reported that already localized bug reports statistically significantly and substantially impact bug localization results, and thus should be removed. However, their evaluation is still limited, as they only investigated 3 projects written in Java. In this study, we replicate the study of Kochhar et al. on the effect of biases in bug report dataset for bug localization. Further investigation on this topic is necessary as new and larger bug report datasets have been proposed without being checked for these biases. We conduct our analysis on a collection of 2,913 bug reports taken from the recently released Bugzbook dataset that fix Python files. To investigate the prevalence of the biases, we check the bias distributions. For each bias, we select and label a set of bug reports that may contain the bias and compute the proportion of bug reports in the set that exhibit the bias. We find that 5%, 23%, and 30% of the bug reports that we investigated are affected by biases 1, 2, and 3 respectively. Then, we investigate the effect of the three biases on bug localization by measuring the performance of IncBL, a recent bug localization tool, and the classical Vector Space Model (VSM) based bug localization tool, which was used in the Kochhar et al. study. Our experiment results highlight that bias 2 significantly impact the bug localization results, while bias 1 and 3 do not have a significant impact. We also find that the effect sizes of bias 2 to IncBL and VSM are different, where IncBL has a higher effect size than VSM. Our findings corroborate the result reported by Kochhar et al. and demonstrate that bias 2 not only affects the 3 Java projects investigated in their study, but also others in another programming language (i.e., Python). This highlights the need to eliminate bias 2 from the evaluation of future bug localization tools. As a by-product of our replication study, we have released a benchmark dataset, which we refer to as CAPTURED, that has been cleaned from the three biases. CAPTURED contains Python programs and therefore augments the cleaned dataset released by Kochhar et al., which only contains Java programs.","PeriodicalId":437520,"journal":{"name":"2022 IEEE International Conference on Software Analysis, Evolution and Reengineering (SANER)","volume":"25 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 IEEE International Conference on Software Analysis, Evolution and Reengineering (SANER)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/saner53432.2022.00027","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 3

Abstract

Bug localization is the task of identifying parts of the source code that needs to be changed to resolve a bug report. As this task is difficult, automatic bug localization tools have been proposed. The development and evaluation of these tools rely on the availability of high-quality bug report datasets. In 2014, Kochhar et al. identified three biases in datasets used to evaluate bug localization techniques: (1) misclassified bug report, (2) already localized bug report, and (3) incorrect ground truth file in a bug report. They reported that already localized bug reports statistically significantly and substantially impact bug localization results, and thus should be removed. However, their evaluation is still limited, as they only investigated 3 projects written in Java. In this study, we replicate the study of Kochhar et al. on the effect of biases in bug report dataset for bug localization. Further investigation on this topic is necessary as new and larger bug report datasets have been proposed without being checked for these biases. We conduct our analysis on a collection of 2,913 bug reports taken from the recently released Bugzbook dataset that fix Python files. To investigate the prevalence of the biases, we check the bias distributions. For each bias, we select and label a set of bug reports that may contain the bias and compute the proportion of bug reports in the set that exhibit the bias. We find that 5%, 23%, and 30% of the bug reports that we investigated are affected by biases 1, 2, and 3 respectively. Then, we investigate the effect of the three biases on bug localization by measuring the performance of IncBL, a recent bug localization tool, and the classical Vector Space Model (VSM) based bug localization tool, which was used in the Kochhar et al. study. Our experiment results highlight that bias 2 significantly impact the bug localization results, while bias 1 and 3 do not have a significant impact. We also find that the effect sizes of bias 2 to IncBL and VSM are different, where IncBL has a higher effect size than VSM. Our findings corroborate the result reported by Kochhar et al. and demonstrate that bias 2 not only affects the 3 Java projects investigated in their study, but also others in another programming language (i.e., Python). This highlights the need to eliminate bias 2 from the evaluation of future bug localization tools. As a by-product of our replication study, we have released a benchmark dataset, which we refer to as CAPTURED, that has been cleaned from the three biases. CAPTURED contains Python programs and therefore augments the cleaned dataset released by Kochhar et al., which only contains Java programs.

查看原文本刊更多论文

偏差对Bug定位的影响:评估与基准

Bug本地化是识别源代码中需要更改以解决Bug报告的部分的任务。由于这项任务很困难，因此提出了自动错误定位工具。这些工具的开发和评估依赖于高质量bug报告数据集的可用性。2014年，Kochhar等人发现了用于评估bug本地化技术的数据集中的三个偏差:(1)错误分类的bug报告，(2)已经本地化的bug报告，以及(3)错误的bug报告中的基础事实文件。他们报告说，已经本地化的bug报告在统计上显著地影响了bug本地化结果，因此应该删除。然而，他们的评估仍然有限，因为他们只调查了3个用Java编写的项目。在本研究中，我们复制了Kochhar等人关于bug报告数据集中偏差对bug定位的影响的研究。对这个主题的进一步调查是必要的，因为新的和更大的错误报告数据集已经被提出，而没有检查这些偏差。我们对最近发布的Bugzbook数据集中的2913个bug报告进行了分析，这些报告修复了Python文件。为了研究偏差的普遍性，我们检查了偏差分布。对于每个偏差，我们选择并标记一组可能包含偏差的错误报告，并计算该集中显示偏差的错误报告的比例。我们发现我们调查的bug报告中分别有5%、23%和30%受到偏差1、2和3的影响。然后，我们通过测量最近的bug定位工具IncBL和Kochhar等人研究中使用的基于向量空间模型(VSM)的经典bug定位工具的性能，研究了这三种偏差对bug定位的影响。我们的实验结果表明，偏差2显著影响bug定位结果，而偏差1和偏差3没有显著影响。我们还发现，偏倚2对IncBL和VSM的效应大小是不同的，其中IncBL的效应大小高于VSM。我们的发现证实了Kochhar等人报告的结果，并证明偏见2不仅影响了他们研究的3个Java项目，还影响了其他编程语言(即Python)的项目。这凸显了在评估未来的bug本地化工具时消除偏见的必要性。作为我们复制研究的副产品，我们发布了一个基准数据集，我们称之为捕获，它已经从三个偏差中清除了出来。capture包含Python程序，因此增强了Kochhar等人发布的清理过的数据集，后者只包含Java程序。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2022 IEEE International Conference on Software Analysis, Evolution and Reengineering (SANER)

自引率

0.00%

发文量