{"title":"CCDive: A Deep Dive into Code Clone Detection Using Local Sequence Alignment","authors":"Yasir Glani;Luo Ping;Syed Asad Shah;Lin Ke","doi":"10.26599/TST.2024.9010075","DOIUrl":null,"url":null,"abstract":"The rapid evolution of software development has accentuated the deficiencies of prevailing code clone detection techniques. As modern applications become more complex, traditional cloning tools often struggle to detect general and large-gap clones that undergo regular modification. Such challenges pose threats to software integrity, emphasizing the critical need for improved code cloning techniques. Observing the prevailing gap, we propose an innovative code clone dive (CCDive) code cloning technique, which is designed to detect an extensive range of clones, from direct clones to the often challenging large-gap clones, thoroughly covering different categories, such as very strongly Type-III, strongly Type-III, and moderate Type-III clones. In CCDive, the fusion of a level-by-level abstraction and an innovative similarity matching algorithm ensures the recognition of clones even when nearly half the original code in the chunk has been modified. Furthermore, by integrating the Smith-Waterman local sequence alignment, the capability of CCDive to spot exact code transformation locations can be enhanced. In a comprehensive evaluation, CCDive was compared with well-known code cloning techniques. The efficacy of CCDive was measured using precision, recall, F1-score, accuracy, and efficiency. CCDive consistently surpassed other techniques in the precision, recall, F1-score, and accuracy metrics for both file-based and function-based clone detection. The robust performance of CCDive emphasizes its effectiveness, reliability, accuracy, and efficiency, making it well-suited for practical applications in the real world.","PeriodicalId":48690,"journal":{"name":"Tsinghua Science and Technology","volume":"30 4","pages":"1435-1456"},"PeriodicalIF":6.6000,"publicationDate":"2025-03-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10908670","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Tsinghua Science and Technology","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/10908670/","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"Multidisciplinary","Score":null,"Total":0}
引用次数: 0
Abstract
The rapid evolution of software development has accentuated the deficiencies of prevailing code clone detection techniques. As modern applications become more complex, traditional cloning tools often struggle to detect general and large-gap clones that undergo regular modification. Such challenges pose threats to software integrity, emphasizing the critical need for improved code cloning techniques. Observing the prevailing gap, we propose an innovative code clone dive (CCDive) code cloning technique, which is designed to detect an extensive range of clones, from direct clones to the often challenging large-gap clones, thoroughly covering different categories, such as very strongly Type-III, strongly Type-III, and moderate Type-III clones. In CCDive, the fusion of a level-by-level abstraction and an innovative similarity matching algorithm ensures the recognition of clones even when nearly half the original code in the chunk has been modified. Furthermore, by integrating the Smith-Waterman local sequence alignment, the capability of CCDive to spot exact code transformation locations can be enhanced. In a comprehensive evaluation, CCDive was compared with well-known code cloning techniques. The efficacy of CCDive was measured using precision, recall, F1-score, accuracy, and efficiency. CCDive consistently surpassed other techniques in the precision, recall, F1-score, and accuracy metrics for both file-based and function-based clone detection. The robust performance of CCDive emphasizes its effectiveness, reliability, accuracy, and efficiency, making it well-suited for practical applications in the real world.
期刊介绍:
Tsinghua Science and Technology (Tsinghua Sci Technol) started publication in 1996. It is an international academic journal sponsored by Tsinghua University and is published bimonthly. This journal aims at presenting the up-to-date scientific achievements in computer science, electronic engineering, and other IT fields. Contributions all over the world are welcome.