神经系统集成模型子程序的源代码摘要

2021 IEEE International Conference on Software Maintenance and Evolution (ICSME) Pub Date : 2021-07-23 DOI:10.26226/morressier.613b5418842293c031b5b62e

Alexander LeClair, Aakash Bansal, Collin McMillan

{"title":"神经系统集成模型子程序的源代码摘要","authors":"Alexander LeClair, Aakash Bansal, Collin McMillan","doi":"10.26226/morressier.613b5418842293c031b5b62e","DOIUrl":null,"url":null,"abstract":"A source code summary of a subroutine is a brief description of that subroutine. Summaries underpin a majority of documentation consumed by programmers, such as the method summaries in JavaDocs. Source code summarization is the task of writing these summaries. At present, most state-of-the-art approaches for code summarization are neural network-based solutions akin to seq2seq, graph2seq, and other encoder-decoder architectures. The input to the encoder is source code, while the decoder helps predict the natural language summary. While these models tend to be similar in structure, evidence is emerging that different models make different contributions to prediction quality - differences in model performance are orthogonal and complementary rather than uniform over the entire dataset. In this paper, we explore the orthogonal nature of different neural code summarization approaches and propose ensemble models to exploit this orthogonality for better overall performance. We demonstrate that a simple ensemble strategy boosts performance by up to 14.8%, and provide an explanation for this boost. The takeaway from this work is that a relatively small change to the inference procedure in most neural code summarization techniques leads to outsized improvements in prediction quality.","PeriodicalId":205629,"journal":{"name":"2021 IEEE International Conference on Software Maintenance and Evolution (ICSME)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-07-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"19","resultStr":"{\"title\":\"Ensemble Models for Neural Source Code Summarization of Subroutines\",\"authors\":\"Alexander LeClair, Aakash Bansal, Collin McMillan\",\"doi\":\"10.26226/morressier.613b5418842293c031b5b62e\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"A source code summary of a subroutine is a brief description of that subroutine. Summaries underpin a majority of documentation consumed by programmers, such as the method summaries in JavaDocs. Source code summarization is the task of writing these summaries. At present, most state-of-the-art approaches for code summarization are neural network-based solutions akin to seq2seq, graph2seq, and other encoder-decoder architectures. The input to the encoder is source code, while the decoder helps predict the natural language summary. While these models tend to be similar in structure, evidence is emerging that different models make different contributions to prediction quality - differences in model performance are orthogonal and complementary rather than uniform over the entire dataset. In this paper, we explore the orthogonal nature of different neural code summarization approaches and propose ensemble models to exploit this orthogonality for better overall performance. We demonstrate that a simple ensemble strategy boosts performance by up to 14.8%, and provide an explanation for this boost. The takeaway from this work is that a relatively small change to the inference procedure in most neural code summarization techniques leads to outsized improvements in prediction quality.\",\"PeriodicalId\":205629,\"journal\":{\"name\":\"2021 IEEE International Conference on Software Maintenance and Evolution (ICSME)\",\"volume\":\"1 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2021-07-23\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"19\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2021 IEEE International Conference on Software Maintenance and Evolution (ICSME)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.26226/morressier.613b5418842293c031b5b62e\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 IEEE International Conference on Software Maintenance and Evolution (ICSME)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.26226/morressier.613b5418842293c031b5b62e","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 19

摘要

子程序的源代码摘要是对该子程序的简要描述。摘要是程序员使用的大部分文档的基础，比如javadoc中的方法摘要。源代码摘要就是编写这些摘要的任务。目前，最先进的代码总结方法是基于神经网络的解决方案，类似于seq2seq、graph2seq和其他编码器-解码器架构。编码器的输入是源代码，而解码器帮助预测自然语言摘要。虽然这些模型在结构上趋于相似，但越来越多的证据表明，不同的模型对预测质量的贡献不同——模型性能的差异在整个数据集中是正交和互补的，而不是统一的。在本文中，我们探讨了不同神经编码总结方法的正交性，并提出了集成模型来利用这种正交性来获得更好的整体性能。我们证明了一个简单的集成策略可以将性能提高14.8%，并对这种提高提供了解释。从这项工作中得出的结论是，在大多数神经代码总结技术中，对推理过程进行相对较小的更改会导致预测质量的巨大改进。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Ensemble Models for Neural Source Code Summarization of Subroutines

A source code summary of a subroutine is a brief description of that subroutine. Summaries underpin a majority of documentation consumed by programmers, such as the method summaries in JavaDocs. Source code summarization is the task of writing these summaries. At present, most state-of-the-art approaches for code summarization are neural network-based solutions akin to seq2seq, graph2seq, and other encoder-decoder architectures. The input to the encoder is source code, while the decoder helps predict the natural language summary. While these models tend to be similar in structure, evidence is emerging that different models make different contributions to prediction quality - differences in model performance are orthogonal and complementary rather than uniform over the entire dataset. In this paper, we explore the orthogonal nature of different neural code summarization approaches and propose ensemble models to exploit this orthogonality for better overall performance. We demonstrate that a simple ensemble strategy boosts performance by up to 14.8%, and provide an explanation for this boost. The takeaway from this work is that a relatively small change to the inference procedure in most neural code summarization techniques leads to outsized improvements in prediction quality.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2021 IEEE International Conference on Software Maintenance and Evolution (ICSME)

自引率

0.00%

发文量