使用抽象语法树进行克隆检测

Proceedings. International Conference on Software Maintenance (Cat. No. 98CB36272) Pub Date : 1998-03-16 DOI:10.1109/ICSM.1998.738528

I. Baxter, A. Yahin, L. D. Moura, Marcelo Sant'Anna, Lorraine Bier

{"title":"使用抽象语法树进行克隆检测","authors":"I. Baxter, A. Yahin, L. D. Moura, Marcelo Sant'Anna, Lorraine Bier","doi":"10.1109/ICSM.1998.738528","DOIUrl":null,"url":null,"abstract":"Existing research suggests that a considerable fraction (5-10%) of the source code of large scale computer programs is duplicate code (\"clones\"). Detection and removal of such clones promises decreased software maintenance costs of possibly the same magnitude. Previous work was limited to detection of either near misses differing only in single lexems, or near misses only between complete functions. The paper presents simple and practical methods for detecting exact and near miss clones over arbitrary program fragments in program source code by using abstract syntax trees. Previous work also did not suggest practical means for removing detected clones. Since our methods operate in terms of the program structure, clones could be removed by mechanical methods producing in-lined procedures or standard preprocessor macros. A tool using these techniques is applied to a C production software system of some 400 K source lines, and the results confirm detected levels of duplication found by previous work. The tool produces macro bodies needed for clone removal, and macro invocations to replace the clones. The tool uses a variation of the well known compiler method for detecting common sub expressions. This method determines exact tree matches; a number of adjustments are needed to detect equivalent statement sequences, commutative operands, and nearly exact matches. We additionally suggest that clone detection could also be useful in producing more structured code, and in reverse engineering to discover domain concepts and their implementations.","PeriodicalId":271895,"journal":{"name":"Proceedings. International Conference on Software Maintenance (Cat. No. 98CB36272)","volume":"31 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"1998-03-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1427","resultStr":"{\"title\":\"Clone detection using abstract syntax trees\",\"authors\":\"I. Baxter, A. Yahin, L. D. Moura, Marcelo Sant'Anna, Lorraine Bier\",\"doi\":\"10.1109/ICSM.1998.738528\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Existing research suggests that a considerable fraction (5-10%) of the source code of large scale computer programs is duplicate code (\\\"clones\\\"). Detection and removal of such clones promises decreased software maintenance costs of possibly the same magnitude. Previous work was limited to detection of either near misses differing only in single lexems, or near misses only between complete functions. The paper presents simple and practical methods for detecting exact and near miss clones over arbitrary program fragments in program source code by using abstract syntax trees. Previous work also did not suggest practical means for removing detected clones. Since our methods operate in terms of the program structure, clones could be removed by mechanical methods producing in-lined procedures or standard preprocessor macros. A tool using these techniques is applied to a C production software system of some 400 K source lines, and the results confirm detected levels of duplication found by previous work. The tool produces macro bodies needed for clone removal, and macro invocations to replace the clones. The tool uses a variation of the well known compiler method for detecting common sub expressions. This method determines exact tree matches; a number of adjustments are needed to detect equivalent statement sequences, commutative operands, and nearly exact matches. We additionally suggest that clone detection could also be useful in producing more structured code, and in reverse engineering to discover domain concepts and their implementations.\",\"PeriodicalId\":271895,\"journal\":{\"name\":\"Proceedings. International Conference on Software Maintenance (Cat. No. 98CB36272)\",\"volume\":\"31 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"1998-03-16\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1427\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings. International Conference on Software Maintenance (Cat. No. 98CB36272)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICSM.1998.738528\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings. International Conference on Software Maintenance (Cat. No. 98CB36272)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICSM.1998.738528","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 1427

摘要

现有的研究表明，大型计算机程序的源代码中有相当一部分(5-10%)是重复代码(“克隆”)。检测和删除这样的克隆可以降低软件维护成本，可能达到同样的程度。以前的工作仅限于检测仅在单个词汇中不同的近缺失，或仅在完整函数之间的近缺失。本文提出了一种简单实用的方法，利用抽象语法树来检测程序源代码中任意程序片段的精确克隆和近缺失克隆。以前的工作也没有提出去除检测到的克隆的实用方法。由于我们的方法根据程序结构进行操作，因此可以通过产生内联过程或标准预处理器宏的机械方法来删除克隆。使用这些技术的工具被应用到一个大约400k源行的C生产软件系统中，结果证实了以前工作中发现的重复检测水平。该工具生成移除克隆所需的宏主体，以及替换克隆所需的宏调用。该工具使用一种众所周知的编译器方法的变体来检测公共子表达式。该方法确定精确的树匹配;需要进行许多调整来检测等效语句序列、交换操作数和几乎完全匹配。我们还建议克隆检测在生成更结构化的代码，以及在逆向工程中发现领域概念及其实现方面也很有用。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Clone detection using abstract syntax trees

Existing research suggests that a considerable fraction (5-10%) of the source code of large scale computer programs is duplicate code ("clones"). Detection and removal of such clones promises decreased software maintenance costs of possibly the same magnitude. Previous work was limited to detection of either near misses differing only in single lexems, or near misses only between complete functions. The paper presents simple and practical methods for detecting exact and near miss clones over arbitrary program fragments in program source code by using abstract syntax trees. Previous work also did not suggest practical means for removing detected clones. Since our methods operate in terms of the program structure, clones could be removed by mechanical methods producing in-lined procedures or standard preprocessor macros. A tool using these techniques is applied to a C production software system of some 400 K source lines, and the results confirm detected levels of duplication found by previous work. The tool produces macro bodies needed for clone removal, and macro invocations to replace the clones. The tool uses a variation of the well known compiler method for detecting common sub expressions. This method determines exact tree matches; a number of adjustments are needed to detect equivalent statement sequences, commutative operands, and nearly exact matches. We additionally suggest that clone detection could also be useful in producing more structured code, and in reverse engineering to discover domain concepts and their implementations.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Proceedings. International Conference on Software Maintenance (Cat. No. 98CB36272)

自引率

0.00%

发文量