信息检索增强的自动源代码摘要

2022 IEEE 22nd International Conference on Software Quality, Reliability and Security (QRS) Pub Date : 2022-12-01 DOI:10.1109/QRS57517.2022.00099

Yubo Zhang, Yanfang Liu, Xinxin Fan, Yunfeng Lu

{"title":"信息检索增强的自动源代码摘要","authors":"Yubo Zhang, Yanfang Liu, Xinxin Fan, Yunfeng Lu","doi":"10.1109/QRS57517.2022.00099","DOIUrl":null,"url":null,"abstract":"With the purpose of saving the developing time of software engineers and promoting the work efficiency of programs, the research on automated source-code summarization (SCS) has become necessary in recent years, i.e. generating language descriptions for source code. To date, there exist two categories of SCS methods: information retrieval (IR)-based SCS and neural-based SCS. The latter is the mainstream method at present, however, this line of work suffers from the drawback of incapability to generate low-frequency words, which potentially degrades the performance. To tackle this predicament, we in this paper propose an IR-enhanced neural SCS method RetCom to improve the prediction of low-frequency words through leveraging both structural-level and semantic-level code retrievals. Furthermore, we figure out a token-level context-dependent mixture network to fuse different information sources, i.e. original code, structurally most similar code, and semantically most similar code. Finally, extensive experiments are performed to validate our proposed RetCom using two real-world datasets. Compared to several baseline methods, the experimental results show that our method does validly capture more low-frequency words to conduct a superior performance.","PeriodicalId":143812,"journal":{"name":"2022 IEEE 22nd International Conference on Software Quality, Reliability and Security (QRS)","volume":"19 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"RetCom: Information Retrieval-Enhanced Automatic Source-Code Summarization\",\"authors\":\"Yubo Zhang, Yanfang Liu, Xinxin Fan, Yunfeng Lu\",\"doi\":\"10.1109/QRS57517.2022.00099\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"With the purpose of saving the developing time of software engineers and promoting the work efficiency of programs, the research on automated source-code summarization (SCS) has become necessary in recent years, i.e. generating language descriptions for source code. To date, there exist two categories of SCS methods: information retrieval (IR)-based SCS and neural-based SCS. The latter is the mainstream method at present, however, this line of work suffers from the drawback of incapability to generate low-frequency words, which potentially degrades the performance. To tackle this predicament, we in this paper propose an IR-enhanced neural SCS method RetCom to improve the prediction of low-frequency words through leveraging both structural-level and semantic-level code retrievals. Furthermore, we figure out a token-level context-dependent mixture network to fuse different information sources, i.e. original code, structurally most similar code, and semantically most similar code. Finally, extensive experiments are performed to validate our proposed RetCom using two real-world datasets. Compared to several baseline methods, the experimental results show that our method does validly capture more low-frequency words to conduct a superior performance.\",\"PeriodicalId\":143812,\"journal\":{\"name\":\"2022 IEEE 22nd International Conference on Software Quality, Reliability and Security (QRS)\",\"volume\":\"19 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2022-12-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2022 IEEE 22nd International Conference on Software Quality, Reliability and Security (QRS)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/QRS57517.2022.00099\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 IEEE 22nd International Conference on Software Quality, Reliability and Security (QRS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/QRS57517.2022.00099","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

摘要

为了节省软件工程师的开发时间，提高程序的工作效率，近年来对自动源代码摘要(SCS)的研究成为必要，即对源代码生成语言描述。目前，SCS的研究方法主要有两大类:基于信息检索的SCS和基于神经的SCS。后一种方法是目前的主流方法，然而，这种方法的缺点是无法生成低频词，这可能会降低性能。为了解决这一问题，本文提出了一种红外增强神经SCS方法RetCom，通过利用结构级和语义级的代码检索来改进低频词的预测。在此基础上，提出了一种基于上下文的标记级混合网络，用于融合不同的信息源，即原始代码、结构最相似代码和语义最相似代码。最后，使用两个真实数据集进行了广泛的实验来验证我们提出的RetCom。实验结果表明，与几种基线方法相比，我们的方法确实有效地捕获了更多的低频词，并取得了更好的性能。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

RetCom: Information Retrieval-Enhanced Automatic Source-Code Summarization

With the purpose of saving the developing time of software engineers and promoting the work efficiency of programs, the research on automated source-code summarization (SCS) has become necessary in recent years, i.e. generating language descriptions for source code. To date, there exist two categories of SCS methods: information retrieval (IR)-based SCS and neural-based SCS. The latter is the mainstream method at present, however, this line of work suffers from the drawback of incapability to generate low-frequency words, which potentially degrades the performance. To tackle this predicament, we in this paper propose an IR-enhanced neural SCS method RetCom to improve the prediction of low-frequency words through leveraging both structural-level and semantic-level code retrievals. Furthermore, we figure out a token-level context-dependent mixture network to fuse different information sources, i.e. original code, structurally most similar code, and semantically most similar code. Finally, extensive experiments are performed to validate our proposed RetCom using two real-world datasets. Compared to several baseline methods, the experimental results show that our method does validly capture more low-frequency words to conduct a superior performance.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2022 IEEE 22nd International Conference on Software Quality, Reliability and Security (QRS)

自引率

0.00%

发文量