基于上下文的迁移学习低资源代码摘要

Yi Guo, Yu Chai, Lehuan Zhang, Hui Li, Mengzhi Luo, Shikai Guo
{"title":"基于上下文的迁移学习低资源代码摘要","authors":"Yi Guo, Yu Chai, Lehuan Zhang, Hui Li, Mengzhi Luo, Shikai Guo","doi":"10.1002/spe.3288","DOIUrl":null,"url":null,"abstract":"Source code summaries improve the readability and intelligibility of code, help developers understand programs, and improve the efficiency of software maintenance and upgrade processes. Unfortunately, these code comments are often mismatched, missing, or outdated in software projects, resulting in developers needing to infer functionality from source code, affecting the efficiency of software maintenance and evolution. Various methods based on neuronal networks are proposed to solve the problem of synthesis of source code. However, the current work is being carried out on resource-rich programming languages such as Java and Python, and some low-resource languages may not perform well. In order to solve the above challenges, we propose a context-based transfer learning model for low resource code summarization (LRCS), which learns the common information from the language with rich resources, and then transfers it to the target language model for further learning. It consists of two components: the summary generation component is used to learn the syntactic and semantic information of the code, and the learning transfer component is used to improve the generalization ability of the model in the learning process of cross-language code summarization. Experimental results show that LRCS outperforms baseline methods in code summarization in terms of sentence-level BLEU, corpus-level BLEU and METEOR. For example, LRCS improves corpus-level BLEU scores by 52.90%, 41.10%, and 14.97%, respectively, compared to baseline methods.","PeriodicalId":21899,"journal":{"name":"Software: Practice and Experience","volume":"33 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2023-11-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Context-based transfer learning for low resource code summarization\",\"authors\":\"Yi Guo, Yu Chai, Lehuan Zhang, Hui Li, Mengzhi Luo, Shikai Guo\",\"doi\":\"10.1002/spe.3288\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Source code summaries improve the readability and intelligibility of code, help developers understand programs, and improve the efficiency of software maintenance and upgrade processes. Unfortunately, these code comments are often mismatched, missing, or outdated in software projects, resulting in developers needing to infer functionality from source code, affecting the efficiency of software maintenance and evolution. Various methods based on neuronal networks are proposed to solve the problem of synthesis of source code. However, the current work is being carried out on resource-rich programming languages such as Java and Python, and some low-resource languages may not perform well. In order to solve the above challenges, we propose a context-based transfer learning model for low resource code summarization (LRCS), which learns the common information from the language with rich resources, and then transfers it to the target language model for further learning. It consists of two components: the summary generation component is used to learn the syntactic and semantic information of the code, and the learning transfer component is used to improve the generalization ability of the model in the learning process of cross-language code summarization. Experimental results show that LRCS outperforms baseline methods in code summarization in terms of sentence-level BLEU, corpus-level BLEU and METEOR. For example, LRCS improves corpus-level BLEU scores by 52.90%, 41.10%, and 14.97%, respectively, compared to baseline methods.\",\"PeriodicalId\":21899,\"journal\":{\"name\":\"Software: Practice and Experience\",\"volume\":\"33 1\",\"pages\":\"\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2023-11-20\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Software: Practice and Experience\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1002/spe.3288\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Software: Practice and Experience","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1002/spe.3288","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

摘要

源代码摘要可以提高代码的可读性和可理解性,帮助开发人员理解程序,提高软件维护和升级过程的效率。不幸的是,这些代码注释在软件项目中经常不匹配、缺失或过时,导致开发人员需要从源代码中推断功能,从而影响软件维护和发展的效率。提出了基于神经网络的各种方法来解决源代码的合成问题。然而,目前的工作是在资源丰富的编程语言(如Java和Python)上进行的,一些资源不足的语言可能表现不佳。为了解决上述挑战,我们提出了一种基于上下文的低资源代码摘要迁移学习模型,该模型从资源丰富的语言中学习公共信息,然后将其迁移到目标语言模型中进行进一步学习。它由两部分组成:摘要生成组件用于学习代码的语法和语义信息,学习迁移组件用于提高模型在跨语言代码摘要学习过程中的泛化能力。实验结果表明,LRCS在句子级BLEU、语料库级BLEU和METEOR方面都优于基线方法。例如,与基线方法相比,LRCS将语料库水平的BLEU分数分别提高了52.90%、41.10%和14.97%。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Context-based transfer learning for low resource code summarization
Source code summaries improve the readability and intelligibility of code, help developers understand programs, and improve the efficiency of software maintenance and upgrade processes. Unfortunately, these code comments are often mismatched, missing, or outdated in software projects, resulting in developers needing to infer functionality from source code, affecting the efficiency of software maintenance and evolution. Various methods based on neuronal networks are proposed to solve the problem of synthesis of source code. However, the current work is being carried out on resource-rich programming languages such as Java and Python, and some low-resource languages may not perform well. In order to solve the above challenges, we propose a context-based transfer learning model for low resource code summarization (LRCS), which learns the common information from the language with rich resources, and then transfers it to the target language model for further learning. It consists of two components: the summary generation component is used to learn the syntactic and semantic information of the code, and the learning transfer component is used to improve the generalization ability of the model in the learning process of cross-language code summarization. Experimental results show that LRCS outperforms baseline methods in code summarization in terms of sentence-level BLEU, corpus-level BLEU and METEOR. For example, LRCS improves corpus-level BLEU scores by 52.90%, 41.10%, and 14.97%, respectively, compared to baseline methods.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信