一个简单的基于检索的代码注释生成方法

2022 IEEE International Conference on Software Analysis, Evolution and Reengineering (SANER) Pub Date : 2022-03-01 DOI:10.1109/saner53432.2022.00126

Xiaoning Zhu, Chaofeng Sha, Junyu Niu

{"title":"一个简单的基于检索的代码注释生成方法","authors":"Xiaoning Zhu, Chaofeng Sha, Junyu Niu","doi":"10.1109/saner53432.2022.00126","DOIUrl":null,"url":null,"abstract":"Code comments can effectively help developers comprehend programs. However, it is a challenging and time-consuming task to write good comments for source code. Therefore, automatic generation of code comments is a promising research direction. Recently, researchers have leveraged neural machine translation to generate comments from source code and achieved impressive results. Another line of work has tried to exploit information retrieval (IR) techniques and showed excellent performance improvement on this task. However, current retrieval-based methods usually involve complex retrieval and editing operations, which are difficult to implement. To tackle the problems, we propose kNN-Transformer, a simple end-to-end retrieval-based code comment generation method. Our method combines a simple nearest neighbor retrieval module and a powerful transformer-based model. When generating each token, the retrieval module estimates a probability distribution depending on the current translation context rather than obtaining the retrieved samples in advance. The experiment results on four widely used public datasets (two Java datasets and two Python datasets) demonstrate that our method outperforms all the baselines, and our $k$ NN retrieval module brings significant improvement when similar code snippets are available.","PeriodicalId":437520,"journal":{"name":"2022 IEEE International Conference on Software Analysis, Evolution and Reengineering (SANER)","volume":"390 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":"{\"title\":\"A Simple Retrieval-based Method for Code Comment Generation\",\"authors\":\"Xiaoning Zhu, Chaofeng Sha, Junyu Niu\",\"doi\":\"10.1109/saner53432.2022.00126\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Code comments can effectively help developers comprehend programs. However, it is a challenging and time-consuming task to write good comments for source code. Therefore, automatic generation of code comments is a promising research direction. Recently, researchers have leveraged neural machine translation to generate comments from source code and achieved impressive results. Another line of work has tried to exploit information retrieval (IR) techniques and showed excellent performance improvement on this task. However, current retrieval-based methods usually involve complex retrieval and editing operations, which are difficult to implement. To tackle the problems, we propose kNN-Transformer, a simple end-to-end retrieval-based code comment generation method. Our method combines a simple nearest neighbor retrieval module and a powerful transformer-based model. When generating each token, the retrieval module estimates a probability distribution depending on the current translation context rather than obtaining the retrieved samples in advance. The experiment results on four widely used public datasets (two Java datasets and two Python datasets) demonstrate that our method outperforms all the baselines, and our $k$ NN retrieval module brings significant improvement when similar code snippets are available.\",\"PeriodicalId\":437520,\"journal\":{\"name\":\"2022 IEEE International Conference on Software Analysis, Evolution and Reengineering (SANER)\",\"volume\":\"390 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2022-03-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"2\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2022 IEEE International Conference on Software Analysis, Evolution and Reengineering (SANER)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/saner53432.2022.00126\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 IEEE International Conference on Software Analysis, Evolution and Reengineering (SANER)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/saner53432.2022.00126","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 2

摘要

代码注释可以有效地帮助开发人员理解程序。然而，为源代码编写好的注释是一项具有挑战性且耗时的任务。因此，代码注释的自动生成是一个很有前途的研究方向。最近，研究人员利用神经机器翻译从源代码生成注释，并取得了令人印象深刻的结果。另一项工作尝试利用信息检索(IR)技术，并在此任务上显示出出色的性能改进。然而，当前基于检索的方法通常涉及复杂的检索和编辑操作，难以实现。为了解决这些问题，我们提出了kNN-Transformer，一个简单的端到端基于检索的代码注释生成方法。我们的方法结合了一个简单的最近邻检索模块和一个强大的基于变压器的模型。当生成每个标记时，检索模块根据当前翻译上下文估计概率分布，而不是提前获取检索到的样本。在四个广泛使用的公共数据集(两个Java数据集和两个Python数据集)上的实验结果表明，我们的方法优于所有基线，并且我们的$k$ NN检索模块在相似的代码片段可用时带来了显着的改进。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

A Simple Retrieval-based Method for Code Comment Generation

Code comments can effectively help developers comprehend programs. However, it is a challenging and time-consuming task to write good comments for source code. Therefore, automatic generation of code comments is a promising research direction. Recently, researchers have leveraged neural machine translation to generate comments from source code and achieved impressive results. Another line of work has tried to exploit information retrieval (IR) techniques and showed excellent performance improvement on this task. However, current retrieval-based methods usually involve complex retrieval and editing operations, which are difficult to implement. To tackle the problems, we propose kNN-Transformer, a simple end-to-end retrieval-based code comment generation method. Our method combines a simple nearest neighbor retrieval module and a powerful transformer-based model. When generating each token, the retrieval module estimates a probability distribution depending on the current translation context rather than obtaining the retrieved samples in advance. The experiment results on four widely used public datasets (two Java datasets and two Python datasets) demonstrate that our method outperforms all the baselines, and our $k$ NN retrieval module brings significant improvement when similar code snippets are available.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2022 IEEE International Conference on Software Analysis, Evolution and Reengineering (SANER)

自引率

0.00%

发文量