Interpretation-based Code Summarization

Mingyang Geng, Shangwen Wang, Dezun Dong, Hao Wang, Shaomeng Cao, Kechi Zhang, Zhi Jin
{"title":"Interpretation-based Code Summarization","authors":"Mingyang Geng, Shangwen Wang, Dezun Dong, Hao Wang, Shaomeng Cao, Kechi Zhang, Zhi Jin","doi":"10.1109/ICPC58990.2023.00026","DOIUrl":null,"url":null,"abstract":"Code comment, i.e., the natural language text to describe the semantic of a code snippet, is an important way for developers to comprehend the code. Recently, a number of approaches have been proposed to automatically generate the comment given a code snippet, aiming at facilitating the comprehension activities of developers. Despite that state-of-the-art approaches have already utilized advanced machine learning techniques such as the Transformer model, they often ignore critical information of the source code, leading to the inaccuracy of the generated summarization. In this paper, to boost the effectiveness of code summarization, we propose a two-stage paradigm, where in the first stage, we train an off-the-shelf model and then identify its focuses when generating the initial summarization, through a model interpretation approach, and in the second stage, we reinforce the model to generate more qualified summarization based on the source code and its focuses. Our intuition is that in such a manner the model could learn to identify what critical information in the code has been captured and what has been missed in its initial summarization, and thus revise its initial summarization accordingly, just like how a human student learns to write high-quality summarization for a natural language text. Extensive experiments on two large-scale datasets show that our approach can boost the effectiveness of five state-of-the-art code summarization approaches significantly. Specifically, for the well-known code summarizer, DeepCom, utilizing our two-stage paradigm can increase its BLEU-4 values by around 30% and 25% on the two datasets, respectively.","PeriodicalId":376593,"journal":{"name":"2023 IEEE/ACM 31st International Conference on Program Comprehension (ICPC)","volume":"80 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2023 IEEE/ACM 31st International Conference on Program Comprehension (ICPC)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICPC58990.2023.00026","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 2

Abstract

Code comment, i.e., the natural language text to describe the semantic of a code snippet, is an important way for developers to comprehend the code. Recently, a number of approaches have been proposed to automatically generate the comment given a code snippet, aiming at facilitating the comprehension activities of developers. Despite that state-of-the-art approaches have already utilized advanced machine learning techniques such as the Transformer model, they often ignore critical information of the source code, leading to the inaccuracy of the generated summarization. In this paper, to boost the effectiveness of code summarization, we propose a two-stage paradigm, where in the first stage, we train an off-the-shelf model and then identify its focuses when generating the initial summarization, through a model interpretation approach, and in the second stage, we reinforce the model to generate more qualified summarization based on the source code and its focuses. Our intuition is that in such a manner the model could learn to identify what critical information in the code has been captured and what has been missed in its initial summarization, and thus revise its initial summarization accordingly, just like how a human student learns to write high-quality summarization for a natural language text. Extensive experiments on two large-scale datasets show that our approach can boost the effectiveness of five state-of-the-art code summarization approaches significantly. Specifically, for the well-known code summarizer, DeepCom, utilizing our two-stage paradigm can increase its BLEU-4 values by around 30% and 25% on the two datasets, respectively.
基于解释的代码总结
代码注释,即描述代码段语义的自然语言文本,是开发人员理解代码的重要方式。最近,已经提出了许多方法来自动生成给定代码片段的注释,旨在促进开发人员的理解活动。尽管最先进的方法已经利用了先进的机器学习技术,比如Transformer模型,但是它们经常忽略源代码的关键信息,导致生成的摘要不准确。在本文中,为了提高代码摘要的有效性,我们提出了一个两阶段的范式,其中在第一阶段,我们训练一个现成的模型,然后通过模型解释方法在生成初始摘要时识别其重点,在第二阶段,我们加强模型以基于源代码及其重点生成更合格的摘要。我们的直觉是,通过这种方式,模型可以学会识别代码中哪些关键信息被捕获,哪些在初始摘要中被遗漏,从而相应地修改其初始摘要,就像人类学生如何学习为自然语言文本编写高质量的摘要一样。在两个大规模数据集上的大量实验表明,我们的方法可以显著提高五种最先进的代码摘要方法的有效性。具体来说,对于著名的代码总结器DeepCom来说,利用我们的两阶段范式可以在两个数据集上分别将BLEU-4值提高30%和25%左右。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信