使用机器学习的语义克隆检测

2016 15th IEEE International Conference on Machine Learning and Applications (ICMLA) Pub Date : 2016-12-01 DOI:10.1109/ICMLA.2016.0185

Abdullah M. Sheneamer, J. Kalita

{"title":"使用机器学习的语义克隆检测","authors":"Abdullah M. Sheneamer, J. Kalita","doi":"10.1109/ICMLA.2016.0185","DOIUrl":null,"url":null,"abstract":"If two fragments of source code are identical to each other, they are called code clones. Code clones introduce difficulties in software maintenance and cause bug propagation. In this paper, we present a machine learning framework to automatically detect clones in software, which is able to detect Types-3 and the most complicated kind of clones, Type-4 clones. Previously used traditional features are often weak in detecting the semantic clones The novel aspects of our approach are the extraction of features from abstract syntax trees (AST) and program dependency graphs (PDG), representation of a pair of code fragments as a vector and the use of classification algorithms. The key benefit of this approach is that our approach can find both syntactic and semantic clones extremely well. Our evaluation indicates that using our new AST and PDG features is a viable methodology, since they improve detecting clones on the IJaDataset 2.0.","PeriodicalId":356182,"journal":{"name":"2016 15th IEEE International Conference on Machine Learning and Applications (ICMLA)","volume":"9 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2016-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"48","resultStr":"{\"title\":\"Semantic Clone Detection Using Machine Learning\",\"authors\":\"Abdullah M. Sheneamer, J. Kalita\",\"doi\":\"10.1109/ICMLA.2016.0185\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"If two fragments of source code are identical to each other, they are called code clones. Code clones introduce difficulties in software maintenance and cause bug propagation. In this paper, we present a machine learning framework to automatically detect clones in software, which is able to detect Types-3 and the most complicated kind of clones, Type-4 clones. Previously used traditional features are often weak in detecting the semantic clones The novel aspects of our approach are the extraction of features from abstract syntax trees (AST) and program dependency graphs (PDG), representation of a pair of code fragments as a vector and the use of classification algorithms. The key benefit of this approach is that our approach can find both syntactic and semantic clones extremely well. Our evaluation indicates that using our new AST and PDG features is a viable methodology, since they improve detecting clones on the IJaDataset 2.0.\",\"PeriodicalId\":356182,\"journal\":{\"name\":\"2016 15th IEEE International Conference on Machine Learning and Applications (ICMLA)\",\"volume\":\"9 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2016-12-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"48\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2016 15th IEEE International Conference on Machine Learning and Applications (ICMLA)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICMLA.2016.0185\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2016 15th IEEE International Conference on Machine Learning and Applications (ICMLA)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICMLA.2016.0185","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 48

摘要

如果两个源代码片段彼此相同，它们被称为代码克隆。代码克隆给软件维护带来困难，并导致bug传播。在本文中，我们提出了一个在软件中自动检测克隆的机器学习框架，该框架能够检测类型3和最复杂的类型4克隆。我们的方法的新颖之处是从抽象语法树(AST)和程序依赖图(PDG)中提取特征，将一对代码片段表示为向量，以及使用分类算法。这种方法的主要优点是，我们的方法可以非常好地找到语法和语义克隆。我们的评估表明，使用新的AST和PDG特性是一种可行的方法，因为它们改进了ijadatasset 2.0上的克隆检测。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Semantic Clone Detection Using Machine Learning

If two fragments of source code are identical to each other, they are called code clones. Code clones introduce difficulties in software maintenance and cause bug propagation. In this paper, we present a machine learning framework to automatically detect clones in software, which is able to detect Types-3 and the most complicated kind of clones, Type-4 clones. Previously used traditional features are often weak in detecting the semantic clones The novel aspects of our approach are the extraction of features from abstract syntax trees (AST) and program dependency graphs (PDG), representation of a pair of code fragments as a vector and the use of classification algorithms. The key benefit of this approach is that our approach can find both syntactic and semantic clones extremely well. Our evaluation indicates that using our new AST and PDG features is a viable methodology, since they improve detecting clones on the IJaDataset 2.0.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2016 15th IEEE International Conference on Machine Learning and Applications (ICMLA)

自引率

0.00%

发文量