基于辅助代码分类任务的源代码神经注释生成

2019 26th Asia-Pacific Software Engineering Conference (APSEC) Pub Date : 2019-12-01 DOI:10.1109/APSEC48747.2019.00076

Minghao Chen, Xiaojun Wan

{"title":"基于辅助代码分类任务的源代码神经注释生成","authors":"Minghao Chen, Xiaojun Wan","doi":"10.1109/APSEC48747.2019.00076","DOIUrl":null,"url":null,"abstract":"Code comments help program developers understand programs, read and navigate source code, thus resulting in more efficient software maintenance. Unfortunately, many codes are not commented adequately, or the code comments are missing. So developers have to spend additional time in reading source code. In this paper, we propose a new approach to automatically generating comments for source codes. Following the intuition behind the traditional sequence-to-sequence (Seq2Seq) model for machine translation, we propose a tree-to-sequence (Tree2Seq) model for code comment generation, which leverages an encoder to capture the structure information of source code. More importantly, code classification is involved as an auxiliary task for aiding the Tree2Seq model. We build a multi-task learning model to achieve this goal. We evaluate our models on a benchmark dataset with automatic metrics like BLEU, ROUGE, and METEOR. Experimental results show that our proposed Tree2Seq model outperforms traditional Seq2Seq model with attention, and our proposed multi-task learning model outperforms the state-of-the-art approaches by a substantial margin.","PeriodicalId":325642,"journal":{"name":"2019 26th Asia-Pacific Software Engineering Conference (APSEC)","volume":"80 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"7","resultStr":"{\"title\":\"Neural Comment Generation for Source Code with Auxiliary Code Classification Task\",\"authors\":\"Minghao Chen, Xiaojun Wan\",\"doi\":\"10.1109/APSEC48747.2019.00076\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Code comments help program developers understand programs, read and navigate source code, thus resulting in more efficient software maintenance. Unfortunately, many codes are not commented adequately, or the code comments are missing. So developers have to spend additional time in reading source code. In this paper, we propose a new approach to automatically generating comments for source codes. Following the intuition behind the traditional sequence-to-sequence (Seq2Seq) model for machine translation, we propose a tree-to-sequence (Tree2Seq) model for code comment generation, which leverages an encoder to capture the structure information of source code. More importantly, code classification is involved as an auxiliary task for aiding the Tree2Seq model. We build a multi-task learning model to achieve this goal. We evaluate our models on a benchmark dataset with automatic metrics like BLEU, ROUGE, and METEOR. Experimental results show that our proposed Tree2Seq model outperforms traditional Seq2Seq model with attention, and our proposed multi-task learning model outperforms the state-of-the-art approaches by a substantial margin.\",\"PeriodicalId\":325642,\"journal\":{\"name\":\"2019 26th Asia-Pacific Software Engineering Conference (APSEC)\",\"volume\":\"80 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2019-12-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"7\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2019 26th Asia-Pacific Software Engineering Conference (APSEC)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/APSEC48747.2019.00076\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2019 26th Asia-Pacific Software Engineering Conference (APSEC)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/APSEC48747.2019.00076","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 7

摘要

代码注释帮助程序开发人员理解程序，阅读和导航源代码，从而导致更有效的软件维护。不幸的是，许多代码没有充分注释，或者代码注释丢失。因此，开发人员不得不花费额外的时间来阅读源代码。在本文中，我们提出了一种自动生成源代码注释的新方法。根据机器翻译的传统序列到序列(Seq2Seq)模型背后的直觉，我们提出了用于代码注释生成的树到序列(Tree2Seq)模型，该模型利用编码器捕获源代码的结构信息。更重要的是，代码分类作为辅助Tree2Seq模型的辅助任务。我们建立了一个多任务学习模型来实现这一目标。我们使用BLEU、ROUGE和METEOR等自动指标在基准数据集上评估我们的模型。实验结果表明，我们提出的Tree2Seq模型在注意力方面优于传统的Seq2Seq模型，并且我们提出的多任务学习模型在很大程度上优于最先进的方法。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Neural Comment Generation for Source Code with Auxiliary Code Classification Task

Code comments help program developers understand programs, read and navigate source code, thus resulting in more efficient software maintenance. Unfortunately, many codes are not commented adequately, or the code comments are missing. So developers have to spend additional time in reading source code. In this paper, we propose a new approach to automatically generating comments for source codes. Following the intuition behind the traditional sequence-to-sequence (Seq2Seq) model for machine translation, we propose a tree-to-sequence (Tree2Seq) model for code comment generation, which leverages an encoder to capture the structure information of source code. More importantly, code classification is involved as an auxiliary task for aiding the Tree2Seq model. We build a multi-task learning model to achieve this goal. We evaluate our models on a benchmark dataset with automatic metrics like BLEU, ROUGE, and METEOR. Experimental results show that our proposed Tree2Seq model outperforms traditional Seq2Seq model with attention, and our proposed multi-task learning model outperforms the state-of-the-art approaches by a substantial margin.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2019 26th Asia-Pacific Software Engineering Conference (APSEC)

自引率

0.00%

发文量