一个简单的基于检索的代码注释生成方法

Xiaoning Zhu, Chaofeng Sha, Junyu Niu
{"title":"一个简单的基于检索的代码注释生成方法","authors":"Xiaoning Zhu, Chaofeng Sha, Junyu Niu","doi":"10.1109/saner53432.2022.00126","DOIUrl":null,"url":null,"abstract":"Code comments can effectively help developers comprehend programs. However, it is a challenging and time-consuming task to write good comments for source code. Therefore, automatic generation of code comments is a promising research direction. Recently, researchers have leveraged neural machine translation to generate comments from source code and achieved impressive results. Another line of work has tried to exploit information retrieval (IR) techniques and showed excellent performance improvement on this task. However, current retrieval-based methods usually involve complex retrieval and editing operations, which are difficult to implement. To tackle the problems, we propose kNN-Transformer, a simple end-to-end retrieval-based code comment generation method. Our method combines a simple nearest neighbor retrieval module and a powerful transformer-based model. When generating each token, the retrieval module estimates a probability distribution depending on the current translation context rather than obtaining the retrieved samples in advance. The experiment results on four widely used public datasets (two Java datasets and two Python datasets) demonstrate that our method outperforms all the baselines, and our $k$ NN retrieval module brings significant improvement when similar code snippets are available.","PeriodicalId":437520,"journal":{"name":"2022 IEEE International Conference on Software Analysis, Evolution and Reengineering (SANER)","volume":"390 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":"{\"title\":\"A Simple Retrieval-based Method for Code Comment Generation\",\"authors\":\"Xiaoning Zhu, Chaofeng Sha, Junyu Niu\",\"doi\":\"10.1109/saner53432.2022.00126\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Code comments can effectively help developers comprehend programs. However, it is a challenging and time-consuming task to write good comments for source code. Therefore, automatic generation of code comments is a promising research direction. Recently, researchers have leveraged neural machine translation to generate comments from source code and achieved impressive results. Another line of work has tried to exploit information retrieval (IR) techniques and showed excellent performance improvement on this task. However, current retrieval-based methods usually involve complex retrieval and editing operations, which are difficult to implement. To tackle the problems, we propose kNN-Transformer, a simple end-to-end retrieval-based code comment generation method. Our method combines a simple nearest neighbor retrieval module and a powerful transformer-based model. When generating each token, the retrieval module estimates a probability distribution depending on the current translation context rather than obtaining the retrieved samples in advance. The experiment results on four widely used public datasets (two Java datasets and two Python datasets) demonstrate that our method outperforms all the baselines, and our $k$ NN retrieval module brings significant improvement when similar code snippets are available.\",\"PeriodicalId\":437520,\"journal\":{\"name\":\"2022 IEEE International Conference on Software Analysis, Evolution and Reengineering (SANER)\",\"volume\":\"390 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2022-03-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"2\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2022 IEEE International Conference on Software Analysis, Evolution and Reengineering (SANER)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/saner53432.2022.00126\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 IEEE International Conference on Software Analysis, Evolution and Reengineering (SANER)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/saner53432.2022.00126","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 2

摘要

代码注释可以有效地帮助开发人员理解程序。然而,为源代码编写好的注释是一项具有挑战性且耗时的任务。因此,代码注释的自动生成是一个很有前途的研究方向。最近,研究人员利用神经机器翻译从源代码生成注释,并取得了令人印象深刻的结果。另一项工作尝试利用信息检索(IR)技术,并在此任务上显示出出色的性能改进。然而,当前基于检索的方法通常涉及复杂的检索和编辑操作,难以实现。为了解决这些问题,我们提出了kNN-Transformer,一个简单的端到端基于检索的代码注释生成方法。我们的方法结合了一个简单的最近邻检索模块和一个强大的基于变压器的模型。当生成每个标记时,检索模块根据当前翻译上下文估计概率分布,而不是提前获取检索到的样本。在四个广泛使用的公共数据集(两个Java数据集和两个Python数据集)上的实验结果表明,我们的方法优于所有基线,并且我们的$k$ NN检索模块在相似的代码片段可用时带来了显着的改进。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
A Simple Retrieval-based Method for Code Comment Generation
Code comments can effectively help developers comprehend programs. However, it is a challenging and time-consuming task to write good comments for source code. Therefore, automatic generation of code comments is a promising research direction. Recently, researchers have leveraged neural machine translation to generate comments from source code and achieved impressive results. Another line of work has tried to exploit information retrieval (IR) techniques and showed excellent performance improvement on this task. However, current retrieval-based methods usually involve complex retrieval and editing operations, which are difficult to implement. To tackle the problems, we propose kNN-Transformer, a simple end-to-end retrieval-based code comment generation method. Our method combines a simple nearest neighbor retrieval module and a powerful transformer-based model. When generating each token, the retrieval module estimates a probability distribution depending on the current translation context rather than obtaining the retrieved samples in advance. The experiment results on four widely used public datasets (two Java datasets and two Python datasets) demonstrate that our method outperforms all the baselines, and our $k$ NN retrieval module brings significant improvement when similar code snippets are available.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信