Context-based transfer learning for low resource code summarization

Software: Practice and Experience Pub Date : 2023-11-20 DOI:10.1002/spe.3288

Yi Guo, Yu Chai, Lehuan Zhang, Hui Li, Mengzhi Luo, Shikai Guo

{"title":"Context-based transfer learning for low resource code summarization","authors":"Yi Guo, Yu Chai, Lehuan Zhang, Hui Li, Mengzhi Luo, Shikai Guo","doi":"10.1002/spe.3288","DOIUrl":null,"url":null,"abstract":"Source code summaries improve the readability and intelligibility of code, help developers understand programs, and improve the efficiency of software maintenance and upgrade processes. Unfortunately, these code comments are often mismatched, missing, or outdated in software projects, resulting in developers needing to infer functionality from source code, affecting the efficiency of software maintenance and evolution. Various methods based on neuronal networks are proposed to solve the problem of synthesis of source code. However, the current work is being carried out on resource-rich programming languages such as Java and Python, and some low-resource languages may not perform well. In order to solve the above challenges, we propose a context-based transfer learning model for low resource code summarization (LRCS), which learns the common information from the language with rich resources, and then transfers it to the target language model for further learning. It consists of two components: the summary generation component is used to learn the syntactic and semantic information of the code, and the learning transfer component is used to improve the generalization ability of the model in the learning process of cross-language code summarization. Experimental results show that LRCS outperforms baseline methods in code summarization in terms of sentence-level BLEU, corpus-level BLEU and METEOR. For example, LRCS improves corpus-level BLEU scores by 52.90%, 41.10%, and 14.97%, respectively, compared to baseline methods.","PeriodicalId":21899,"journal":{"name":"Software: Practice and Experience","volume":"33 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2023-11-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Software: Practice and Experience","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1002/spe.3288","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

Source code summaries improve the readability and intelligibility of code, help developers understand programs, and improve the efficiency of software maintenance and upgrade processes. Unfortunately, these code comments are often mismatched, missing, or outdated in software projects, resulting in developers needing to infer functionality from source code, affecting the efficiency of software maintenance and evolution. Various methods based on neuronal networks are proposed to solve the problem of synthesis of source code. However, the current work is being carried out on resource-rich programming languages such as Java and Python, and some low-resource languages may not perform well. In order to solve the above challenges, we propose a context-based transfer learning model for low resource code summarization (LRCS), which learns the common information from the language with rich resources, and then transfers it to the target language model for further learning. It consists of two components: the summary generation component is used to learn the syntactic and semantic information of the code, and the learning transfer component is used to improve the generalization ability of the model in the learning process of cross-language code summarization. Experimental results show that LRCS outperforms baseline methods in code summarization in terms of sentence-level BLEU, corpus-level BLEU and METEOR. For example, LRCS improves corpus-level BLEU scores by 52.90%, 41.10%, and 14.97%, respectively, compared to baseline methods.

查看原文本刊更多论文

基于上下文的迁移学习低资源代码摘要

源代码摘要可以提高代码的可读性和可理解性，帮助开发人员理解程序，提高软件维护和升级过程的效率。不幸的是，这些代码注释在软件项目中经常不匹配、缺失或过时，导致开发人员需要从源代码中推断功能，从而影响软件维护和发展的效率。提出了基于神经网络的各种方法来解决源代码的合成问题。然而，目前的工作是在资源丰富的编程语言(如Java和Python)上进行的，一些资源不足的语言可能表现不佳。为了解决上述挑战，我们提出了一种基于上下文的低资源代码摘要迁移学习模型，该模型从资源丰富的语言中学习公共信息，然后将其迁移到目标语言模型中进行进一步学习。它由两部分组成:摘要生成组件用于学习代码的语法和语义信息，学习迁移组件用于提高模型在跨语言代码摘要学习过程中的泛化能力。实验结果表明，LRCS在句子级BLEU、语料库级BLEU和METEOR方面都优于基线方法。例如，与基线方法相比，LRCS将语料库水平的BLEU分数分别提高了52.90%、41.10%和14.97%。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Software: Practice and Experience

自引率

0.00%

发文量