{"title":"Context-based transfer learning for low resource code summarization","authors":"Yi Guo, Yu Chai, Lehuan Zhang, Hui Li, Mengzhi Luo, Shikai Guo","doi":"10.1002/spe.3288","DOIUrl":null,"url":null,"abstract":"Source code summaries improve the readability and intelligibility of code, help developers understand programs, and improve the efficiency of software maintenance and upgrade processes. Unfortunately, these code comments are often mismatched, missing, or outdated in software projects, resulting in developers needing to infer functionality from source code, affecting the efficiency of software maintenance and evolution. Various methods based on neuronal networks are proposed to solve the problem of synthesis of source code. However, the current work is being carried out on resource-rich programming languages such as Java and Python, and some low-resource languages may not perform well. In order to solve the above challenges, we propose a context-based transfer learning model for low resource code summarization (LRCS), which learns the common information from the language with rich resources, and then transfers it to the target language model for further learning. It consists of two components: the summary generation component is used to learn the syntactic and semantic information of the code, and the learning transfer component is used to improve the generalization ability of the model in the learning process of cross-language code summarization. Experimental results show that LRCS outperforms baseline methods in code summarization in terms of sentence-level BLEU, corpus-level BLEU and METEOR. For example, LRCS improves corpus-level BLEU scores by 52.90%, 41.10%, and 14.97%, respectively, compared to baseline methods.","PeriodicalId":21899,"journal":{"name":"Software: Practice and Experience","volume":"33 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2023-11-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Software: Practice and Experience","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1002/spe.3288","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Source code summaries improve the readability and intelligibility of code, help developers understand programs, and improve the efficiency of software maintenance and upgrade processes. Unfortunately, these code comments are often mismatched, missing, or outdated in software projects, resulting in developers needing to infer functionality from source code, affecting the efficiency of software maintenance and evolution. Various methods based on neuronal networks are proposed to solve the problem of synthesis of source code. However, the current work is being carried out on resource-rich programming languages such as Java and Python, and some low-resource languages may not perform well. In order to solve the above challenges, we propose a context-based transfer learning model for low resource code summarization (LRCS), which learns the common information from the language with rich resources, and then transfers it to the target language model for further learning. It consists of two components: the summary generation component is used to learn the syntactic and semantic information of the code, and the learning transfer component is used to improve the generalization ability of the model in the learning process of cross-language code summarization. Experimental results show that LRCS outperforms baseline methods in code summarization in terms of sentence-level BLEU, corpus-level BLEU and METEOR. For example, LRCS improves corpus-level BLEU scores by 52.90%, 41.10%, and 14.97%, respectively, compared to baseline methods.