{"title":"Semantic code clone detection based on BERT pre-trained model","authors":"Zekai Cheng, Jiahao Hu, Yongkang Guo, Xiaoke Li","doi":"10.1117/12.3031928","DOIUrl":null,"url":null,"abstract":"Clone detection of source code is one of the most fundamental software engineering techniques. Although intensive research has been conducted in the past few years, it has more often addressed syntactic code clone, and there are still a number of problems in detecting semantic code clone. In this paper, we propose an approach that uses C/C++ code to finetune the Bert pre-training model so that it better understands the syntactic and semantic features of the C/C++ code, thus enabling better source code similarity evaluation. We evaluated our approach on a large C/C++ code clone dataset and the results show that our approach achieves excellent semantic code clone detection.","PeriodicalId":342847,"journal":{"name":"International Conference on Algorithms, Microchips and Network Applications","volume":" 44","pages":"131711K - 131711K-7"},"PeriodicalIF":0.0000,"publicationDate":"2024-06-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"International Conference on Algorithms, Microchips and Network Applications","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1117/12.3031928","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Clone detection of source code is one of the most fundamental software engineering techniques. Although intensive research has been conducted in the past few years, it has more often addressed syntactic code clone, and there are still a number of problems in detecting semantic code clone. In this paper, we propose an approach that uses C/C++ code to finetune the Bert pre-training model so that it better understands the syntactic and semantic features of the C/C++ code, thus enabling better source code similarity evaluation. We evaluated our approach on a large C/C++ code clone dataset and the results show that our approach achieves excellent semantic code clone detection.