使用隐马尔可夫模型从代码差异生成提交消息

Proceedings of the 8th International Conference on Software and Information Engineering Pub Date : 2019-04-09 DOI:10.1145/3328833.3328873

Ahmed Awad, K. Nagaty

{"title":"使用隐马尔可夫模型从代码差异生成提交消息","authors":"Ahmed Awad, K. Nagaty","doi":"10.1145/3328833.3328873","DOIUrl":null,"url":null,"abstract":"Commit messages are developer-written messages that document code changes. Such change might be adding features, fixing bugs or simply code updates. Although these messages help in understanding the evolution of any software, it is quite often that developers disregard the process of writing these messages, when making a change. Many automated methods have been proposed to generate commit messages. Due to the inability of those techniques to represent higher order understanding of code changes, the quality of these messages in terms of logic and context representation is very low as opposed to developer written messages. To solve this problem, previous work used deep learning models -specifically, sequence-to-sequence models- were used to automate that task. This model delivered promising results on translating code differences to commit messages. However, after the model's performance was thoroughly investigated in previous work. It was found out that code differences corresponding to almost every high quality commit messages generated by the model were very similar to one or more training sample code differences on a token level. Motivated by that observation, a k-nearest neighbor algorithm that outputs the same exact message of the nearest code difference was proposed in previous work. Inspired by the traditional solution to sequence modeling; Hidden Markov Models, we show that HMMs outperforms sequence-to-sequence models without outputting the same exact message of the nearest code diff, our experiments show an enhancement of 4% against sequence to sequence models.","PeriodicalId":172646,"journal":{"name":"Proceedings of the 8th International Conference on Software and Information Engineering","volume":"25 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-04-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"Commit Message Generation from Code Differences using Hidden Markov Models\",\"authors\":\"Ahmed Awad, K. Nagaty\",\"doi\":\"10.1145/3328833.3328873\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Commit messages are developer-written messages that document code changes. Such change might be adding features, fixing bugs or simply code updates. Although these messages help in understanding the evolution of any software, it is quite often that developers disregard the process of writing these messages, when making a change. Many automated methods have been proposed to generate commit messages. Due to the inability of those techniques to represent higher order understanding of code changes, the quality of these messages in terms of logic and context representation is very low as opposed to developer written messages. To solve this problem, previous work used deep learning models -specifically, sequence-to-sequence models- were used to automate that task. This model delivered promising results on translating code differences to commit messages. However, after the model's performance was thoroughly investigated in previous work. It was found out that code differences corresponding to almost every high quality commit messages generated by the model were very similar to one or more training sample code differences on a token level. Motivated by that observation, a k-nearest neighbor algorithm that outputs the same exact message of the nearest code difference was proposed in previous work. Inspired by the traditional solution to sequence modeling; Hidden Markov Models, we show that HMMs outperforms sequence-to-sequence models without outputting the same exact message of the nearest code diff, our experiments show an enhancement of 4% against sequence to sequence models.\",\"PeriodicalId\":172646,\"journal\":{\"name\":\"Proceedings of the 8th International Conference on Software and Information Engineering\",\"volume\":\"25 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2019-04-09\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the 8th International Conference on Software and Information Engineering\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/3328833.3328873\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 8th International Conference on Software and Information Engineering","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3328833.3328873","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 1

摘要

提交消息是开发人员编写的记录代码更改的消息。这些变化可能是添加功能、修复错误或仅仅是代码更新。尽管这些消息有助于理解任何软件的发展，但在进行更改时，开发人员经常忽略编写这些消息的过程。已经提出了许多自动化的方法来生成提交消息。由于这些技术无法表示对代码更改的更高层次的理解，因此与开发人员编写的消息相比，这些消息在逻辑和上下文表示方面的质量非常低。为了解决这个问题，之前的工作使用了深度学习模型——特别是序列到序列模型——来自动完成这项任务。该模型在将代码差异转换为提交消息方面提供了令人满意的结果。然而，在之前的工作中对模型的性能进行了深入的研究之后。结果发现，模型生成的几乎每个高质量提交消息对应的代码差异与令牌级别上的一个或多个训练样本代码差异非常相似。基于这一观察结果，在之前的工作中提出了一种k近邻算法，该算法输出最接近码差的相同精确消息。受传统序列建模解决方案的启发;隐马尔可夫模型，我们表明hmm优于序列到序列模型，而不会输出最接近的代码差异的相同确切信息，我们的实验表明，相对于序列到序列模型，hmm的性能增强了4%。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Commit Message Generation from Code Differences using Hidden Markov Models

Commit messages are developer-written messages that document code changes. Such change might be adding features, fixing bugs or simply code updates. Although these messages help in understanding the evolution of any software, it is quite often that developers disregard the process of writing these messages, when making a change. Many automated methods have been proposed to generate commit messages. Due to the inability of those techniques to represent higher order understanding of code changes, the quality of these messages in terms of logic and context representation is very low as opposed to developer written messages. To solve this problem, previous work used deep learning models -specifically, sequence-to-sequence models- were used to automate that task. This model delivered promising results on translating code differences to commit messages. However, after the model's performance was thoroughly investigated in previous work. It was found out that code differences corresponding to almost every high quality commit messages generated by the model were very similar to one or more training sample code differences on a token level. Motivated by that observation, a k-nearest neighbor algorithm that outputs the same exact message of the nearest code difference was proposed in previous work. Inspired by the traditional solution to sequence modeling; Hidden Markov Models, we show that HMMs outperforms sequence-to-sequence models without outputting the same exact message of the nearest code diff, our experiments show an enhancement of 4% against sequence to sequence models.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Proceedings of the 8th International Conference on Software and Information Engineering

自引率

0.00%

发文量