FIRA: Fine-Grained Graph-Based Code Change Representation for Automated Commit Message Generation

2022 IEEE/ACM 44th International Conference on Software Engineering (ICSE) Pub Date : 2022-05-01 DOI:10.1145/3510003.3510069

Jinhao Dong, Yiling Lou, Qihao Zhu, Zeyu Sun, Zhilin Li, Wenjie Zhang, Dan Hao

{"title":"FIRA: Fine-Grained Graph-Based Code Change Representation for Automated Commit Message Generation","authors":"Jinhao Dong, Yiling Lou, Qihao Zhu, Zeyu Sun, Zhilin Li, Wenjie Zhang, Dan Hao","doi":"10.1145/3510003.3510069","DOIUrl":null,"url":null,"abstract":"Commit messages summarize code changes of each commit in nat-ural language, which help developers understand code changes without digging into detailed implementations and play an essen-tial role in comprehending software evolution. To alleviate human efforts in writing commit messages, researchers have proposed var-ious automated techniques to generate commit messages, including template-based, information retrieval-based, and learning-based techniques. Although promising, previous techniques have limited effectiveness due to their coarse-grained code change representations. This work proposes a novel commit message generation technique, FIRA, which first represents code changes via fine-grained graphs and then learns to generate commit messages automati-cally. Different from previous techniques, FIRA represents the code changes with fine-grained graphs, which explicitly describe the code edit operations between the old version and the new version, and code tokens at different granularities (i.e., sub-tokens and integral tokens). Based on the graph-based representation, FIRA generates commit messages by a generation model, which includes a graph-neural-network-based encoder and a transformer-based decoder. To make both sub-tokens and integral tokens as available ingredients for commit message generation, the decoder is further incorporated with a novel dual copy mechanism. We further per-form an extensive study to evaluate the effectiveness of FIRA. Our quantitative results show that FIRA outperforms state-of-the-art techniques in terms of BLEU, ROUGE-L, and METEOR; and our ablation analysis further shows that major components in our technique both positively contribute to the effectiveness of FIRA. In addition, we further perform a human study to evaluate the quality of generated commit messages from the perspective of developers, and the results consistently show the effectiveness of FIRA over the compared techniques.","PeriodicalId":202896,"journal":{"name":"2022 IEEE/ACM 44th International Conference on Software Engineering (ICSE)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"24","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 IEEE/ACM 44th International Conference on Software Engineering (ICSE)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3510003.3510069","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 24

Abstract

Commit messages summarize code changes of each commit in nat-ural language, which help developers understand code changes without digging into detailed implementations and play an essen-tial role in comprehending software evolution. To alleviate human efforts in writing commit messages, researchers have proposed var-ious automated techniques to generate commit messages, including template-based, information retrieval-based, and learning-based techniques. Although promising, previous techniques have limited effectiveness due to their coarse-grained code change representations. This work proposes a novel commit message generation technique, FIRA, which first represents code changes via fine-grained graphs and then learns to generate commit messages automati-cally. Different from previous techniques, FIRA represents the code changes with fine-grained graphs, which explicitly describe the code edit operations between the old version and the new version, and code tokens at different granularities (i.e., sub-tokens and integral tokens). Based on the graph-based representation, FIRA generates commit messages by a generation model, which includes a graph-neural-network-based encoder and a transformer-based decoder. To make both sub-tokens and integral tokens as available ingredients for commit message generation, the decoder is further incorporated with a novel dual copy mechanism. We further per-form an extensive study to evaluate the effectiveness of FIRA. Our quantitative results show that FIRA outperforms state-of-the-art techniques in terms of BLEU, ROUGE-L, and METEOR; and our ablation analysis further shows that major components in our technique both positively contribute to the effectiveness of FIRA. In addition, we further perform a human study to evaluate the quality of generated commit messages from the perspective of developers, and the results consistently show the effectiveness of FIRA over the compared techniques.

查看原文本刊更多论文

FIRA:用于自动提交消息生成的细粒度基于图的代码更改表示

提交消息用自然语言总结了每次提交的代码更改，这有助于开发人员在不深入研究详细实现的情况下理解代码更改，并且在理解软件演进中发挥重要作用。为了减少编写提交消息的人力，研究人员提出了各种自动化技术来生成提交消息，包括基于模板的、基于信息检索的和基于学习的技术。虽然以前的技术很有前途，但由于其粗粒度的代码更改表示，其有效性有限。这项工作提出了一种新的提交消息生成技术，FIRA，它首先通过细粒度图表示代码更改，然后学习自动生成提交消息。与以前的技术不同，FIRA用细粒度的图形表示代码变化，它明确地描述了旧版本和新版本之间的代码编辑操作，以及不同粒度的代码标记(即子标记和积分标记)。基于基于图的表示，FIRA通过生成模型生成提交消息，该模型包括基于图神经网络的编码器和基于变压器的解码器。为了使子令牌和积分令牌都成为提交消息生成的可用成分，解码器进一步结合了一种新的双复制机制。我们进一步进行了一项广泛的研究来评估FIRA的有效性。我们的定量结果表明，FIRA在BLEU、ROUGE-L和METEOR方面优于最先进的技术;我们的消融分析进一步表明，我们技术中的主要成分都对FIRA的有效性有积极的贡献。此外，我们进一步进行了一项人类研究，从开发人员的角度评估生成的提交消息的质量，结果一致地表明FIRA优于比较的技术。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2022 IEEE/ACM 44th International Conference on Software Engineering (ICSE)

自引率

0.00%

发文量