Can Better Identifier Splitting Techniques Help Feature Location?

2011 IEEE 19th International Conference on Program Comprehension Pub Date : 2011-06-22 DOI:10.1109/ICPC.2011.47

Bogdan Dit, Latifa Guerrouj, D. Poshyvanyk, G. Antoniol

{"title":"Can Better Identifier Splitting Techniques Help Feature Location?","authors":"Bogdan Dit, Latifa Guerrouj, D. Poshyvanyk, G. Antoniol","doi":"10.1109/ICPC.2011.47","DOIUrl":null,"url":null,"abstract":"The paper presents an exploratory study of two feature location techniques utilizing three strategies for splitting identifiers: Camel Case, Samurai and manual splitting of identifiers. The main research question that we ask in this study is if we had a perfect technique for splitting identifiers, would it still help improve accuracy of feature location techniques applied in different scenarios and settings? In order to answer this research question we investigate two feature location techniques, one based on Information Retrieval and the other one based on the combination of Information Retrieval and dynamic analysis, for locating bugs and features using various configurations of preprocessing strategies on two open-source systems, Rhino and jEdit. The results of an extensive empirical evaluation reveal that feature location techniques using Information Retrieval can benefit from better preprocessing algorithms in some cases, and that their improvement in effectiveness while using manual splitting over state-of-the-art approaches is statistically significant in those cases. However, the results for feature location technique using the combination of Information Retrieval and dynamic analysis do not show any improvement while using manual splitting, indicating that any preprocessing technique will suffice if execution data is available. Overall, our findings outline potential benefits of putting additional research efforts into defining more sophisticated source code preprocessing techniques as they can still be useful in situations where execution information cannot be easily collected.","PeriodicalId":345601,"journal":{"name":"2011 IEEE 19th International Conference on Program Comprehension","volume":"31 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2011-06-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"93","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2011 IEEE 19th International Conference on Program Comprehension","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICPC.2011.47","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 93

Abstract

The paper presents an exploratory study of two feature location techniques utilizing three strategies for splitting identifiers: Camel Case, Samurai and manual splitting of identifiers. The main research question that we ask in this study is if we had a perfect technique for splitting identifiers, would it still help improve accuracy of feature location techniques applied in different scenarios and settings? In order to answer this research question we investigate two feature location techniques, one based on Information Retrieval and the other one based on the combination of Information Retrieval and dynamic analysis, for locating bugs and features using various configurations of preprocessing strategies on two open-source systems, Rhino and jEdit. The results of an extensive empirical evaluation reveal that feature location techniques using Information Retrieval can benefit from better preprocessing algorithms in some cases, and that their improvement in effectiveness while using manual splitting over state-of-the-art approaches is statistically significant in those cases. However, the results for feature location technique using the combination of Information Retrieval and dynamic analysis do not show any improvement while using manual splitting, indicating that any preprocessing technique will suffice if execution data is available. Overall, our findings outline potential benefits of putting additional research efforts into defining more sophisticated source code preprocessing techniques as they can still be useful in situations where execution information cannot be easily collected.

查看原文本刊更多论文

更好的标识符分割技术能帮助特征定位吗?

本文对两种特征定位技术进行了探索性研究，利用骆驼案例、武士案例和手动标识符分割三种策略进行了特征定位。我们在这项研究中提出的主要研究问题是，如果我们有一个完美的技术来分割标识符，它是否仍然有助于提高在不同场景和设置中应用的特征定位技术的准确性?为了回答这个研究问题，我们研究了两种特征定位技术，一种是基于信息检索的，另一种是基于信息检索和动态分析相结合的，在Rhino和jEdit两个开源系统上使用不同的预处理策略配置来定位漏洞和特征。广泛的经验评估结果表明，在某些情况下，使用信息检索的特征定位技术可以从更好的预处理算法中受益，并且在这些情况下，使用人工分割比最先进的方法在有效性方面的改进具有统计意义。然而，使用信息检索和动态分析相结合的特征定位技术的结果在使用手动分割时没有任何改善，这表明如果有执行数据可用，任何预处理技术都是足够的。总的来说，我们的发现概述了投入额外的研究工作来定义更复杂的源代码预处理技术的潜在好处，因为它们在无法轻松收集执行信息的情况下仍然很有用。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2011 IEEE 19th International Conference on Program Comprehension

自引率

0.00%

发文量