When functions change their names: automatic detection of origin relationships

12th Working Conference on Reverse Engineering (WCRE'05) Pub Date : 2005-11-07 DOI:10.1109/WCRE.2005.33

Sunghun Kim, Kai Pan, E. J. Whitehead

{"title":"When functions change their names: automatic detection of origin relationships","authors":"Sunghun Kim, Kai Pan, E. J. Whitehead","doi":"10.1109/WCRE.2005.33","DOIUrl":null,"url":null,"abstract":"It is a common understanding that identifying the same entity such as module, file, and function between revisions is important for software evolution related analysis. Most software evolution researchers use entity names, such as file names and function names, as entity identifiers based on the assumption that each entity is uniquely identifiable by its name. Unfortunately names change over time. In this paper, we propose an automated algorithm that identifies entity mapping at the function level across revisions even when an entity's name changes in the new revision. This algorithm is based on computing function similarities. We introduce eight similarity factors to determine if a function is renamed from a function. To find out which similarity factors are dominant, a significance analysis is performed on each factor. To validate our algorithm and for factor significance analysis, ten human judges manually identified renamed entities across revisions for two open source projects: Subversion and Apache2. Using the manually identified result set we trained weights for each similarity factor and measured the accuracy of our algorithm. We computed the accuracies among human judges. We found our algorithm's accuracy is better than the average accuracy among human judges. We also show that trained weights for similarity factors from one period in one project are reusable for other periods and/or other projects. Finally we combined all possible factor combinations and computed the accuracy of each combination. We found that adding more factors does not necessarily improve the accuracy of origin detection.","PeriodicalId":119724,"journal":{"name":"12th Working Conference on Reverse Engineering (WCRE'05)","volume":"5 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2005-11-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"108","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"12th Working Conference on Reverse Engineering (WCRE'05)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/WCRE.2005.33","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 108

Abstract

It is a common understanding that identifying the same entity such as module, file, and function between revisions is important for software evolution related analysis. Most software evolution researchers use entity names, such as file names and function names, as entity identifiers based on the assumption that each entity is uniquely identifiable by its name. Unfortunately names change over time. In this paper, we propose an automated algorithm that identifies entity mapping at the function level across revisions even when an entity's name changes in the new revision. This algorithm is based on computing function similarities. We introduce eight similarity factors to determine if a function is renamed from a function. To find out which similarity factors are dominant, a significance analysis is performed on each factor. To validate our algorithm and for factor significance analysis, ten human judges manually identified renamed entities across revisions for two open source projects: Subversion and Apache2. Using the manually identified result set we trained weights for each similarity factor and measured the accuracy of our algorithm. We computed the accuracies among human judges. We found our algorithm's accuracy is better than the average accuracy among human judges. We also show that trained weights for similarity factors from one period in one project are reusable for other periods and/or other projects. Finally we combined all possible factor combinations and computed the accuracy of each combination. We found that adding more factors does not necessarily improve the accuracy of origin detection.

查看原文本刊更多论文

当函数更改其名称时:自动检测源关系

通常的理解是，在修订之间识别相同的实体(如模块、文件和功能)对于软件演进相关的分析非常重要。大多数软件进化研究人员使用实体名称，如文件名和函数名，作为实体标识符，基于每个实体通过其名称唯一可识别的假设。不幸的是，名字会随着时间而改变。在本文中，我们提出了一种自动算法，即使实体的名称在新修订中发生了变化，也可以在功能级别上识别跨修订的实体映射。该算法基于函数相似度的计算。我们引入八个相似因素来确定一个函数是否从一个函数重命名。为了找出哪些相似因素占主导地位，对每个因素进行显著性分析。为了验证我们的算法并进行因素显著性分析，10名人工裁判手动识别了两个开源项目(Subversion和Apache2)的修订版本中的重命名实体。使用人工识别的结果集，我们训练了每个相似因子的权重，并测量了算法的准确性。我们计算了人类判断的准确性。我们发现我们的算法的准确率优于人类法官的平均准确率。我们还表明，一个项目中一个时期的相似因素的训练权重可用于其他时期和/或其他项目。最后对所有可能的因子组合进行组合，并计算每种组合的精度。我们发现，增加更多的因素并不一定会提高原点检测的准确性。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

12th Working Conference on Reverse Engineering (WCRE'05)

自引率

0.00%

发文量