{"title":"GRRLN: Gated Recurrent Residual Learning Networks for code clone detection","authors":"Xiangping Zhang, Jianxun Liu, Min Shi","doi":"10.1002/smr.2649","DOIUrl":null,"url":null,"abstract":"Code clone detection is a critical problem in software development and maintenance domains. It aims to identify functionally identical or similar code fragments within an application. Existing works formulate the code clone detection task as a binary classification problem which predicts a code pair as a clone or not based on a pre‐defined threshold. In reality, there are various types of code clone subject to the degree of how a pair of code fragments are similar to each other. To investigate the effect of different code clone detection manners on the clone detection result, we propose Gated Recurrent Residual Learning Networks (GRRLN), a novel neural network model for code clone detection. To train GRRLN, we first represent each code fragment as a statement‐level tree sequence derived from the whole abstract syntax tree (AST). Then, a gated recurrent neural network with residual connections is adopted to fully extract the semantics of all individual statement trees together with their dependency relationships across the input statement sequence. Finally, the output representations of code fragments by GRRLN are used for similarity calculation and clone detection. We evaluate GRRLN using two real‐world datasets for code clone detection and clone type classification. Experiments show that GRRLN achieves promising and compelling results and meanwhile needs significantly less time and memory consumption compared with the state‐of‐the‐art methods.","PeriodicalId":49024,"journal":{"name":"Journal of Software: Evolution and Process","volume":"57 10","pages":""},"PeriodicalIF":2.0000,"publicationDate":"2024-02-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Software: Evolution and Process","FirstCategoryId":"94","ListUrlMain":"https://doi.org/10.1002/smr.2649","RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"Computer Science","Score":null,"Total":0}
引用次数: 0
Abstract
Code clone detection is a critical problem in software development and maintenance domains. It aims to identify functionally identical or similar code fragments within an application. Existing works formulate the code clone detection task as a binary classification problem which predicts a code pair as a clone or not based on a pre‐defined threshold. In reality, there are various types of code clone subject to the degree of how a pair of code fragments are similar to each other. To investigate the effect of different code clone detection manners on the clone detection result, we propose Gated Recurrent Residual Learning Networks (GRRLN), a novel neural network model for code clone detection. To train GRRLN, we first represent each code fragment as a statement‐level tree sequence derived from the whole abstract syntax tree (AST). Then, a gated recurrent neural network with residual connections is adopted to fully extract the semantics of all individual statement trees together with their dependency relationships across the input statement sequence. Finally, the output representations of code fragments by GRRLN are used for similarity calculation and clone detection. We evaluate GRRLN using two real‐world datasets for code clone detection and clone type classification. Experiments show that GRRLN achieves promising and compelling results and meanwhile needs significantly less time and memory consumption compared with the state‐of‐the‐art methods.
期刊介绍:
The “Journal of Software: Evolution and Process” is an archival journal that publishes high quality, state-of-the-art research and practice papers dealing with the conception, development, testing, management, quality, maintenance, and evolution of software, systems, and services, as well as the continuous improvement of processes and capabilities surrounding them. The journal continues the tradition of “The Journal of Software Maintenance and Evolution: Research and Practice” and “Software Process: Improvements and Practice”. We will therefore continue to cover the traditional topics related to software maintenance and evolution as well as software process improvement and practice. At the same time, the concept behind the journal has evolved into a unified vision that recognizes the fundamental changes and transformations that are occurring in the fields of software and systems engineering and the need for us to adapt by broadening the topics that we address and the research methods that are used coupled with the perspectives that are utilised.
Fundamental changes are occurring in the variety, scale and scope of software, systems and services that are being developed from new web and mobile computing to battle theatre technologies and everything in between.