Aiping Zhang, Kui Liu, Liming Fang, Qianjun Liu, Xinyu Yun, S. Ji
{"title":"Learn To Align: A Code Alignment Network For Code Clone Detection","authors":"Aiping Zhang, Kui Liu, Liming Fang, Qianjun Liu, Xinyu Yun, S. Ji","doi":"10.1109/APSEC53868.2021.00008","DOIUrl":null,"url":null,"abstract":"Deep learning techniques have achieved promising results in code clone detection in the past decade. However, existing techniques merely focus on how to extract more dis-criminative features from source codes, while some issues, such as structural differences of functional similar codes, are not explicitly addressed. This phenomenon is common when programmers copy a code segment along with adding or removing several statements, or use a more flexible syntax structure to implement the same function. In this paper, we unify the aforementioned problems as the problem of code misalignment, and propose a novel code alignment network to tackle it. We design a bi-directional causal convolutional neural network to extract feature representations of code fragments with rich structural and semantical information. After feature extraction, our method learns to align the two code fragments in a data-driven fashion. We present two independent strategies for code alignment, namely attention-based alignment and sparse reconstruction-based alignment. Both two strategies strive to learn an alignment matrix that represents the correspondences between two code fragments. Our method outperforms state-of-the-art methods in terms of F1 score by 0.5% and 3.1 % on BigCloneBench and OJClone, respectively11Our code is available at https://github.com/ArcticHare105/Code-Alignment.","PeriodicalId":143800,"journal":{"name":"2021 28th Asia-Pacific Software Engineering Conference (APSEC)","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2021-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 28th Asia-Pacific Software Engineering Conference (APSEC)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/APSEC53868.2021.00008","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1
Abstract
Deep learning techniques have achieved promising results in code clone detection in the past decade. However, existing techniques merely focus on how to extract more dis-criminative features from source codes, while some issues, such as structural differences of functional similar codes, are not explicitly addressed. This phenomenon is common when programmers copy a code segment along with adding or removing several statements, or use a more flexible syntax structure to implement the same function. In this paper, we unify the aforementioned problems as the problem of code misalignment, and propose a novel code alignment network to tackle it. We design a bi-directional causal convolutional neural network to extract feature representations of code fragments with rich structural and semantical information. After feature extraction, our method learns to align the two code fragments in a data-driven fashion. We present two independent strategies for code alignment, namely attention-based alignment and sparse reconstruction-based alignment. Both two strategies strive to learn an alignment matrix that represents the correspondences between two code fragments. Our method outperforms state-of-the-art methods in terms of F1 score by 0.5% and 3.1 % on BigCloneBench and OJClone, respectively11Our code is available at https://github.com/ArcticHare105/Code-Alignment.