Yueming Wu;Wenqi Suo;Siyue Feng;Cong Wu;Deqing Zou;Hai Jin
{"title":"基于关键字连接的程序依赖图细粒度代码克隆检测","authors":"Yueming Wu;Wenqi Suo;Siyue Feng;Cong Wu;Deqing Zou;Hai Jin","doi":"10.1109/TR.2025.3550747","DOIUrl":null,"url":null,"abstract":"Code clone detection is intended to identify functionally similar code fragments, a matter of escalating significance in contemporary software engineering. Numerous methodologies have been proffered for the detection of code clones, among which graph-based approaches exhibit efficacy in addressing semantic code clones. However, they all only consider the feature extraction of a single sample and ignore the semantic connection between different samples, resulting in the detection effect being unsatisfactory. Simultaneously, the majority of existing methods can only ascertain the presence of clones, lacking the capability to provide nuanced insights into which lines of code exhibit greater similarity. In this article, we advocate a novel PDG-based semantic clone detection method, namely, <italic>Keybor</i> which can locate specific cloned lines of code by providing a fine-grained analysis of clone pairs. The highlight of the approach is to consider keywords as a bridge to connect PDG nodes of the target program to retain more semantic information about the functional code. To examine the effectiveness of <italic>Keybor</i>, we assess it on a widely used <italic>BigCloneBench</i> dataset. Experimental results indicate that <italic>Keybor</i> is superior to 14 advanced code clone detection tools (i.e., <italic>CCAligner</i>, <italic>SourcererCC</i>, <italic>Siamese</i>, <italic>NIL</i>, <italic>NiCad</i>, <italic>LVMapper</i>, <italic>CCFinder</i>, <italic>CloneWorks</i>, <italic>Oreo</i>, <italic>Deckard</i>, <italic>CCGraph</i>, <italic>Code2Img</i>, <italic>GPT-3.5-turbo</i>, and <italic>GPT-4</i>).","PeriodicalId":56305,"journal":{"name":"IEEE Transactions on Reliability","volume":"74 3","pages":"3427-3441"},"PeriodicalIF":5.7000,"publicationDate":"2025-04-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Fine-Grained Code Clone Detection by Keywords-Based Connection of Program Dependency Graph\",\"authors\":\"Yueming Wu;Wenqi Suo;Siyue Feng;Cong Wu;Deqing Zou;Hai Jin\",\"doi\":\"10.1109/TR.2025.3550747\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Code clone detection is intended to identify functionally similar code fragments, a matter of escalating significance in contemporary software engineering. Numerous methodologies have been proffered for the detection of code clones, among which graph-based approaches exhibit efficacy in addressing semantic code clones. However, they all only consider the feature extraction of a single sample and ignore the semantic connection between different samples, resulting in the detection effect being unsatisfactory. Simultaneously, the majority of existing methods can only ascertain the presence of clones, lacking the capability to provide nuanced insights into which lines of code exhibit greater similarity. In this article, we advocate a novel PDG-based semantic clone detection method, namely, <italic>Keybor</i> which can locate specific cloned lines of code by providing a fine-grained analysis of clone pairs. The highlight of the approach is to consider keywords as a bridge to connect PDG nodes of the target program to retain more semantic information about the functional code. To examine the effectiveness of <italic>Keybor</i>, we assess it on a widely used <italic>BigCloneBench</i> dataset. Experimental results indicate that <italic>Keybor</i> is superior to 14 advanced code clone detection tools (i.e., <italic>CCAligner</i>, <italic>SourcererCC</i>, <italic>Siamese</i>, <italic>NIL</i>, <italic>NiCad</i>, <italic>LVMapper</i>, <italic>CCFinder</i>, <italic>CloneWorks</i>, <italic>Oreo</i>, <italic>Deckard</i>, <italic>CCGraph</i>, <italic>Code2Img</i>, <italic>GPT-3.5-turbo</i>, and <italic>GPT-4</i>).\",\"PeriodicalId\":56305,\"journal\":{\"name\":\"IEEE Transactions on Reliability\",\"volume\":\"74 3\",\"pages\":\"3427-3441\"},\"PeriodicalIF\":5.7000,\"publicationDate\":\"2025-04-17\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"IEEE Transactions on Reliability\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://ieeexplore.ieee.org/document/10967509/\",\"RegionNum\":2,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"COMPUTER SCIENCE, HARDWARE & ARCHITECTURE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Reliability","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/10967509/","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, HARDWARE & ARCHITECTURE","Score":null,"Total":0}
Fine-Grained Code Clone Detection by Keywords-Based Connection of Program Dependency Graph
Code clone detection is intended to identify functionally similar code fragments, a matter of escalating significance in contemporary software engineering. Numerous methodologies have been proffered for the detection of code clones, among which graph-based approaches exhibit efficacy in addressing semantic code clones. However, they all only consider the feature extraction of a single sample and ignore the semantic connection between different samples, resulting in the detection effect being unsatisfactory. Simultaneously, the majority of existing methods can only ascertain the presence of clones, lacking the capability to provide nuanced insights into which lines of code exhibit greater similarity. In this article, we advocate a novel PDG-based semantic clone detection method, namely, Keybor which can locate specific cloned lines of code by providing a fine-grained analysis of clone pairs. The highlight of the approach is to consider keywords as a bridge to connect PDG nodes of the target program to retain more semantic information about the functional code. To examine the effectiveness of Keybor, we assess it on a widely used BigCloneBench dataset. Experimental results indicate that Keybor is superior to 14 advanced code clone detection tools (i.e., CCAligner, SourcererCC, Siamese, NIL, NiCad, LVMapper, CCFinder, CloneWorks, Oreo, Deckard, CCGraph, Code2Img, GPT-3.5-turbo, and GPT-4).
期刊介绍:
IEEE Transactions on Reliability is a refereed journal for the reliability and allied disciplines including, but not limited to, maintainability, physics of failure, life testing, prognostics, design and manufacture for reliability, reliability for systems of systems, network availability, mission success, warranty, safety, and various measures of effectiveness. Topics eligible for publication range from hardware to software, from materials to systems, from consumer and industrial devices to manufacturing plants, from individual items to networks, from techniques for making things better to ways of predicting and measuring behavior in the field. As an engineering subject that supports new and existing technologies, we constantly expand into new areas of the assurance sciences.