基于关键字连接的程序依赖图细粒度代码克隆检测

IF 5.7 2区 计算机科学 Q1 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE
Yueming Wu;Wenqi Suo;Siyue Feng;Cong Wu;Deqing Zou;Hai Jin
{"title":"基于关键字连接的程序依赖图细粒度代码克隆检测","authors":"Yueming Wu;Wenqi Suo;Siyue Feng;Cong Wu;Deqing Zou;Hai Jin","doi":"10.1109/TR.2025.3550747","DOIUrl":null,"url":null,"abstract":"Code clone detection is intended to identify functionally similar code fragments, a matter of escalating significance in contemporary software engineering. Numerous methodologies have been proffered for the detection of code clones, among which graph-based approaches exhibit efficacy in addressing semantic code clones. However, they all only consider the feature extraction of a single sample and ignore the semantic connection between different samples, resulting in the detection effect being unsatisfactory. Simultaneously, the majority of existing methods can only ascertain the presence of clones, lacking the capability to provide nuanced insights into which lines of code exhibit greater similarity. In this article, we advocate a novel PDG-based semantic clone detection method, namely, <italic>Keybor</i> which can locate specific cloned lines of code by providing a fine-grained analysis of clone pairs. The highlight of the approach is to consider keywords as a bridge to connect PDG nodes of the target program to retain more semantic information about the functional code. To examine the effectiveness of <italic>Keybor</i>, we assess it on a widely used <italic>BigCloneBench</i> dataset. Experimental results indicate that <italic>Keybor</i> is superior to 14 advanced code clone detection tools (i.e., <italic>CCAligner</i>, <italic>SourcererCC</i>, <italic>Siamese</i>, <italic>NIL</i>, <italic>NiCad</i>, <italic>LVMapper</i>, <italic>CCFinder</i>, <italic>CloneWorks</i>, <italic>Oreo</i>, <italic>Deckard</i>, <italic>CCGraph</i>, <italic>Code2Img</i>, <italic>GPT-3.5-turbo</i>, and <italic>GPT-4</i>).","PeriodicalId":56305,"journal":{"name":"IEEE Transactions on Reliability","volume":"74 3","pages":"3427-3441"},"PeriodicalIF":5.7000,"publicationDate":"2025-04-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Fine-Grained Code Clone Detection by Keywords-Based Connection of Program Dependency Graph\",\"authors\":\"Yueming Wu;Wenqi Suo;Siyue Feng;Cong Wu;Deqing Zou;Hai Jin\",\"doi\":\"10.1109/TR.2025.3550747\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Code clone detection is intended to identify functionally similar code fragments, a matter of escalating significance in contemporary software engineering. Numerous methodologies have been proffered for the detection of code clones, among which graph-based approaches exhibit efficacy in addressing semantic code clones. However, they all only consider the feature extraction of a single sample and ignore the semantic connection between different samples, resulting in the detection effect being unsatisfactory. Simultaneously, the majority of existing methods can only ascertain the presence of clones, lacking the capability to provide nuanced insights into which lines of code exhibit greater similarity. In this article, we advocate a novel PDG-based semantic clone detection method, namely, <italic>Keybor</i> which can locate specific cloned lines of code by providing a fine-grained analysis of clone pairs. The highlight of the approach is to consider keywords as a bridge to connect PDG nodes of the target program to retain more semantic information about the functional code. To examine the effectiveness of <italic>Keybor</i>, we assess it on a widely used <italic>BigCloneBench</i> dataset. Experimental results indicate that <italic>Keybor</i> is superior to 14 advanced code clone detection tools (i.e., <italic>CCAligner</i>, <italic>SourcererCC</i>, <italic>Siamese</i>, <italic>NIL</i>, <italic>NiCad</i>, <italic>LVMapper</i>, <italic>CCFinder</i>, <italic>CloneWorks</i>, <italic>Oreo</i>, <italic>Deckard</i>, <italic>CCGraph</i>, <italic>Code2Img</i>, <italic>GPT-3.5-turbo</i>, and <italic>GPT-4</i>).\",\"PeriodicalId\":56305,\"journal\":{\"name\":\"IEEE Transactions on Reliability\",\"volume\":\"74 3\",\"pages\":\"3427-3441\"},\"PeriodicalIF\":5.7000,\"publicationDate\":\"2025-04-17\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"IEEE Transactions on Reliability\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://ieeexplore.ieee.org/document/10967509/\",\"RegionNum\":2,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"COMPUTER SCIENCE, HARDWARE & ARCHITECTURE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Reliability","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/10967509/","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, HARDWARE & ARCHITECTURE","Score":null,"Total":0}
引用次数: 0

摘要

代码克隆检测旨在识别功能相似的代码片段,这在当代软件工程中具有越来越重要的意义。已有许多方法用于检测代码克隆,其中基于图的方法在处理语义代码克隆方面表现出有效性。然而,它们都只考虑单个样本的特征提取,忽略了不同样本之间的语义联系,导致检测效果不理想。同时,大多数现有的方法只能确定克隆的存在,缺乏对哪些代码行表现出更大的相似性提供细致洞察的能力。在本文中,我们提倡一种新的基于pdg的语义克隆检测方法,即Keybor,它可以通过对克隆对进行细粒度分析来定位特定的克隆代码行。该方法的重点是将关键字视为连接目标程序的PDG节点的桥梁,以保留有关功能代码的更多语义信息。为了检验Keybor的有效性,我们在一个广泛使用的BigCloneBench数据集上对其进行了评估。实验结果表明,Keybor优于14种高级代码克隆检测工具(CCAligner、SourcererCC、Siamese、NIL、NiCad、LVMapper、CCFinder、CloneWorks、Oreo、Deckard、CCGraph、Code2Img、GPT-3.5-turbo和GPT-4)。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Fine-Grained Code Clone Detection by Keywords-Based Connection of Program Dependency Graph
Code clone detection is intended to identify functionally similar code fragments, a matter of escalating significance in contemporary software engineering. Numerous methodologies have been proffered for the detection of code clones, among which graph-based approaches exhibit efficacy in addressing semantic code clones. However, they all only consider the feature extraction of a single sample and ignore the semantic connection between different samples, resulting in the detection effect being unsatisfactory. Simultaneously, the majority of existing methods can only ascertain the presence of clones, lacking the capability to provide nuanced insights into which lines of code exhibit greater similarity. In this article, we advocate a novel PDG-based semantic clone detection method, namely, Keybor which can locate specific cloned lines of code by providing a fine-grained analysis of clone pairs. The highlight of the approach is to consider keywords as a bridge to connect PDG nodes of the target program to retain more semantic information about the functional code. To examine the effectiveness of Keybor, we assess it on a widely used BigCloneBench dataset. Experimental results indicate that Keybor is superior to 14 advanced code clone detection tools (i.e., CCAligner, SourcererCC, Siamese, NIL, NiCad, LVMapper, CCFinder, CloneWorks, Oreo, Deckard, CCGraph, Code2Img, GPT-3.5-turbo, and GPT-4).
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
IEEE Transactions on Reliability
IEEE Transactions on Reliability 工程技术-工程:电子与电气
CiteScore
12.20
自引率
8.50%
发文量
153
审稿时长
7.5 months
期刊介绍: IEEE Transactions on Reliability is a refereed journal for the reliability and allied disciplines including, but not limited to, maintainability, physics of failure, life testing, prognostics, design and manufacture for reliability, reliability for systems of systems, network availability, mission success, warranty, safety, and various measures of effectiveness. Topics eligible for publication range from hardware to software, from materials to systems, from consumer and industrial devices to manufacturing plants, from individual items to networks, from techniques for making things better to ways of predicting and measuring behavior in the field. As an engineering subject that supports new and existing technologies, we constantly expand into new areas of the assurance sciences.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信