Are Bug Reports Enough for Text Retrieval-Based Bug Localization?

2018 IEEE International Conference on Software Maintenance and Evolution (ICSME) Pub Date : 2018-09-01 DOI:10.1109/ICSME.2018.00046

Chris Mills, Jevgenija Pantiuchina, Esteban Parra, G. Bavota, S. Haiduc

{"title":"Are Bug Reports Enough for Text Retrieval-Based Bug Localization?","authors":"Chris Mills, Jevgenija Pantiuchina, Esteban Parra, G. Bavota, S. Haiduc","doi":"10.1109/ICSME.2018.00046","DOIUrl":null,"url":null,"abstract":"Text Retrieval (TR) has been widely used to support many software engineering tasks, including bug localization (i.e., the activity of localizing buggy code starting from a bug report). Many studies show TR's effectiveness in lowering the manual effort required to perform this maintenance task; however, the actual usefulness of TR-based bug localization has been questioned in recent studies. These studies discuss (i) potential biases in the experimental design usually adopted to evaluate TRbased bug localization techniques and (ii) their poor performance in the scenario when they are needed most: when the bug report, which serves as the de facto query in most studies, does not contain localization hints (e.g., code snippets, method names, etc.) Fundamentally, these studies raise the question: do bug reports provide sufficient information to perform TR-based localization? In this work, we approach that question from two perspectives. First, we investigate potential biases in the evaluation of TR-based approaches which artificially boost the performance of these techniques, making them appear more successful than they are. Second, we analyze bug report text with and without localization hints using a genetic algorithm to derive a near-optimal query that provides insight into the potential of that bug report for use in TR-based localization. Through this analysis we show that in most cases the bug report vocabulary (i.e., the terms contained in the bug title and description) is all we need to formulate effective queries, making TR-based bug localization successful without supplementary query expansion. Most notably, this also holds when localization hints are completely absent from the bug report. In fact, our results suggest that the next major step in improving TR-based bug localization is the ability to formulate these near-optimal queries.","PeriodicalId":6572,"journal":{"name":"2018 IEEE International Conference on Software Maintenance and Evolution (ICSME)","volume":"14 1","pages":"381-392"},"PeriodicalIF":0.0000,"publicationDate":"2018-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"43","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2018 IEEE International Conference on Software Maintenance and Evolution (ICSME)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICSME.2018.00046","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 43

Abstract

Text Retrieval (TR) has been widely used to support many software engineering tasks, including bug localization (i.e., the activity of localizing buggy code starting from a bug report). Many studies show TR's effectiveness in lowering the manual effort required to perform this maintenance task; however, the actual usefulness of TR-based bug localization has been questioned in recent studies. These studies discuss (i) potential biases in the experimental design usually adopted to evaluate TRbased bug localization techniques and (ii) their poor performance in the scenario when they are needed most: when the bug report, which serves as the de facto query in most studies, does not contain localization hints (e.g., code snippets, method names, etc.) Fundamentally, these studies raise the question: do bug reports provide sufficient information to perform TR-based localization? In this work, we approach that question from two perspectives. First, we investigate potential biases in the evaluation of TR-based approaches which artificially boost the performance of these techniques, making them appear more successful than they are. Second, we analyze bug report text with and without localization hints using a genetic algorithm to derive a near-optimal query that provides insight into the potential of that bug report for use in TR-based localization. Through this analysis we show that in most cases the bug report vocabulary (i.e., the terms contained in the bug title and description) is all we need to formulate effective queries, making TR-based bug localization successful without supplementary query expansion. Most notably, this also holds when localization hints are completely absent from the bug report. In fact, our results suggest that the next major step in improving TR-based bug localization is the ability to formulate these near-optimal queries.

查看原文本刊更多论文

Bug报告是否足以用于基于文本检索的Bug定位?

文本检索(TR)已被广泛用于支持许多软件工程任务，包括错误本地化(即，从错误报告开始本地化错误代码的活动)。许多研究表明，TR在降低执行此维护任务所需的人工工作量方面是有效的;然而，在最近的研究中，基于tr的bug定位的实际用途受到了质疑。这些研究讨论了(i)通常用于评估基于trs的bug定位技术的实验设计中的潜在偏差，以及(ii)它们在最需要的场景中的糟糕表现:当大多数研究中作为事实上的查询的bug报告不包含本地化提示(例如代码片段、方法名等)时，这些研究从根本上提出了一个问题:bug报告是否提供了足够的信息来执行基于trs的本地化?在这项工作中，我们从两个角度来探讨这个问题。首先，我们调查了基于tr的方法评估中的潜在偏差，这些偏差人为地提高了这些技术的性能，使它们看起来比实际更成功。其次，我们使用遗传算法分析带有和不带有本地化提示的bug报告文本，以获得近乎最优的查询，该查询提供了对该bug报告的潜力的洞察，以便在基于tr的本地化中使用。通过这个分析，我们发现在大多数情况下，bug报告词汇表(即bug标题和描述中包含的术语)是我们制定有效查询所需要的全部，使得基于tr的bug定位成功，而无需补充查询扩展。最值得注意的是，当bug报告中完全没有本地化提示时也是如此。事实上，我们的结果表明，改进基于tr的错误定位的下一个主要步骤是制定这些接近最优查询的能力。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2018 IEEE International Conference on Software Maintenance and Evolution (ICSME)

自引率

0.00%

发文量