SCDetector

Proceedings of the 35th IEEE/ACM International Conference on Automated Software Engineering Pub Date : 2020-12-21 DOI:10.1145/3324884.3416562

Yueming Wu, Deqing Zou, Shihan Dou, Sirui Yang, Wei Yang, Feng Cheng, Hong Liang, Hai Jin

{"title":"SCDetector","authors":"Yueming Wu, Deqing Zou, Shihan Dou, Sirui Yang, Wei Yang, Feng Cheng, Hong Liang, Hai Jin","doi":"10.1145/3324884.3416562","DOIUrl":null,"url":null,"abstract":"Code clone detection is to find out code fragments with similar functionalities, which has been more and more important in software engineering. Many approaches have been proposed to detect code clones, in which token-based methods are the most scalable but cannot handle semantic clones because of the lack of consideration of program semantics. To address the issue, researchers conduct program analysis to distill the program semantics into a graph representation and detect clones by matching the graphs. However, such approaches suffer from low scalability since graph matching is typically time-consuming. In this paper, we propose SCDetector to combine the scalability of token-based methods with the accuracy of graph-based methods for software functional clone detection. Given a function source code, we first extract the control flow graph by static analysis. Instead of using traditional heavyweight graph matching, we treat the graph as a social network and apply social-network-centrality analysis to dig out the centrality of each basic block. Then we assign the centrality to each token in a basic block and sum the centrality ofthe same token in different basic blocks. By this, a graph is turned into certain tokens with graph details (i.e., centrality), called semantic tokens. Finally, these semantic tokens are fed into a Siamese architecture neural network to train a code clone detector. We evaluate SCDetector on two large datasets of functionally similar code. Experimental results indicate that our system is superior to four state-of-the-art methods (i.e., SourcererCC, Deckard, RtvNN, and ASTNN) and the time cost of SCDetector is 14 times less than a traditional graph-based method (i.e., CCSharp) on detecting semantic clones.","PeriodicalId":267160,"journal":{"name":"Proceedings of the 35th IEEE/ACM International Conference on Automated Software Engineering","volume":"5 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-12-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"39","resultStr":"{\"title\":\"SCDetector\",\"authors\":\"Yueming Wu, Deqing Zou, Shihan Dou, Sirui Yang, Wei Yang, Feng Cheng, Hong Liang, Hai Jin\",\"doi\":\"10.1145/3324884.3416562\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Code clone detection is to find out code fragments with similar functionalities, which has been more and more important in software engineering. Many approaches have been proposed to detect code clones, in which token-based methods are the most scalable but cannot handle semantic clones because of the lack of consideration of program semantics. To address the issue, researchers conduct program analysis to distill the program semantics into a graph representation and detect clones by matching the graphs. However, such approaches suffer from low scalability since graph matching is typically time-consuming. In this paper, we propose SCDetector to combine the scalability of token-based methods with the accuracy of graph-based methods for software functional clone detection. Given a function source code, we first extract the control flow graph by static analysis. Instead of using traditional heavyweight graph matching, we treat the graph as a social network and apply social-network-centrality analysis to dig out the centrality of each basic block. Then we assign the centrality to each token in a basic block and sum the centrality ofthe same token in different basic blocks. By this, a graph is turned into certain tokens with graph details (i.e., centrality), called semantic tokens. Finally, these semantic tokens are fed into a Siamese architecture neural network to train a code clone detector. We evaluate SCDetector on two large datasets of functionally similar code. Experimental results indicate that our system is superior to four state-of-the-art methods (i.e., SourcererCC, Deckard, RtvNN, and ASTNN) and the time cost of SCDetector is 14 times less than a traditional graph-based method (i.e., CCSharp) on detecting semantic clones.\",\"PeriodicalId\":267160,\"journal\":{\"name\":\"Proceedings of the 35th IEEE/ACM International Conference on Automated Software Engineering\",\"volume\":\"5 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2020-12-21\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"39\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the 35th IEEE/ACM International Conference on Automated Software Engineering\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/3324884.3416562\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 35th IEEE/ACM International Conference on Automated Software Engineering","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3324884.3416562","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 39

摘要

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

SCDetector

Code clone detection is to find out code fragments with similar functionalities, which has been more and more important in software engineering. Many approaches have been proposed to detect code clones, in which token-based methods are the most scalable but cannot handle semantic clones because of the lack of consideration of program semantics. To address the issue, researchers conduct program analysis to distill the program semantics into a graph representation and detect clones by matching the graphs. However, such approaches suffer from low scalability since graph matching is typically time-consuming. In this paper, we propose SCDetector to combine the scalability of token-based methods with the accuracy of graph-based methods for software functional clone detection. Given a function source code, we first extract the control flow graph by static analysis. Instead of using traditional heavyweight graph matching, we treat the graph as a social network and apply social-network-centrality analysis to dig out the centrality of each basic block. Then we assign the centrality to each token in a basic block and sum the centrality ofthe same token in different basic blocks. By this, a graph is turned into certain tokens with graph details (i.e., centrality), called semantic tokens. Finally, these semantic tokens are fed into a Siamese architecture neural network to train a code clone detector. We evaluate SCDetector on two large datasets of functionally similar code. Experimental results indicate that our system is superior to four state-of-the-art methods (i.e., SourcererCC, Deckard, RtvNN, and ASTNN) and the time cost of SCDetector is 14 times less than a traditional graph-based method (i.e., CCSharp) on detecting semantic clones.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Proceedings of the 35th IEEE/ACM International Conference on Automated Software Engineering

自引率

0.00%

发文量