Learn To Align: A Code Alignment Network For Code Clone Detection

2021 28th Asia-Pacific Software Engineering Conference (APSEC) Pub Date : 2021-12-01 DOI:10.1109/APSEC53868.2021.00008

Aiping Zhang, Kui Liu, Liming Fang, Qianjun Liu, Xinyu Yun, S. Ji

{"title":"Learn To Align: A Code Alignment Network For Code Clone Detection","authors":"Aiping Zhang, Kui Liu, Liming Fang, Qianjun Liu, Xinyu Yun, S. Ji","doi":"10.1109/APSEC53868.2021.00008","DOIUrl":null,"url":null,"abstract":"Deep learning techniques have achieved promising results in code clone detection in the past decade. However, existing techniques merely focus on how to extract more dis-criminative features from source codes, while some issues, such as structural differences of functional similar codes, are not explicitly addressed. This phenomenon is common when programmers copy a code segment along with adding or removing several statements, or use a more flexible syntax structure to implement the same function. In this paper, we unify the aforementioned problems as the problem of code misalignment, and propose a novel code alignment network to tackle it. We design a bi-directional causal convolutional neural network to extract feature representations of code fragments with rich structural and semantical information. After feature extraction, our method learns to align the two code fragments in a data-driven fashion. We present two independent strategies for code alignment, namely attention-based alignment and sparse reconstruction-based alignment. Both two strategies strive to learn an alignment matrix that represents the correspondences between two code fragments. Our method outperforms state-of-the-art methods in terms of F1 score by 0.5% and 3.1 % on BigCloneBench and OJClone, respectively11Our code is available at https://github.com/ArcticHare105/Code-Alignment.","PeriodicalId":143800,"journal":{"name":"2021 28th Asia-Pacific Software Engineering Conference (APSEC)","volume":"150 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 28th Asia-Pacific Software Engineering Conference (APSEC)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/APSEC53868.2021.00008","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 1

Abstract

Deep learning techniques have achieved promising results in code clone detection in the past decade. However, existing techniques merely focus on how to extract more dis-criminative features from source codes, while some issues, such as structural differences of functional similar codes, are not explicitly addressed. This phenomenon is common when programmers copy a code segment along with adding or removing several statements, or use a more flexible syntax structure to implement the same function. In this paper, we unify the aforementioned problems as the problem of code misalignment, and propose a novel code alignment network to tackle it. We design a bi-directional causal convolutional neural network to extract feature representations of code fragments with rich structural and semantical information. After feature extraction, our method learns to align the two code fragments in a data-driven fashion. We present two independent strategies for code alignment, namely attention-based alignment and sparse reconstruction-based alignment. Both two strategies strive to learn an alignment matrix that represents the correspondences between two code fragments. Our method outperforms state-of-the-art methods in terms of F1 score by 0.5% and 3.1 % on BigCloneBench and OJClone, respectively11Our code is available at https://github.com/ArcticHare105/Code-Alignment.

查看原文本刊更多论文

学习对齐:代码克隆检测的代码对齐网络

在过去的十年中，深度学习技术在代码克隆检测方面取得了可喜的成果。然而，现有的技术只关注如何从源代码中提取更多的区别特征，而一些问题，如功能相似代码的结构差异，没有明确解决。当程序员复制代码段并添加或删除几个语句时，或者使用更灵活的语法结构来实现相同的功能时，这种现象很常见。本文将上述问题统一为代码对齐问题，并提出了一种新的代码对齐网络来解决代码对齐问题。我们设计了一个双向因果卷积神经网络来提取具有丰富结构和语义信息的代码片段的特征表示。在特征提取之后，我们的方法学习以数据驱动的方式对齐两个代码片段。我们提出了两种独立的代码对齐策略，即基于注意力的对齐和基于稀疏重建的对齐。这两种策略都努力学习一个表示两个代码片段之间对应关系的对齐矩阵。在BigCloneBench和OJClone上，我们的方法在F1得分方面分别比最先进的方法高出0.5%和3.1%。11我们的代码可从https://github.com/ArcticHare105/Code-Alignment获得。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2021 28th Asia-Pacific Software Engineering Conference (APSEC)

自引率

0.00%

发文量