{"title":"使用隐马尔可夫模型从代码差异生成提交消息","authors":"Ahmed Awad, K. Nagaty","doi":"10.1145/3328833.3328873","DOIUrl":null,"url":null,"abstract":"Commit messages are developer-written messages that document code changes. Such change might be adding features, fixing bugs or simply code updates. Although these messages help in understanding the evolution of any software, it is quite often that developers disregard the process of writing these messages, when making a change. Many automated methods have been proposed to generate commit messages. Due to the inability of those techniques to represent higher order understanding of code changes, the quality of these messages in terms of logic and context representation is very low as opposed to developer written messages. To solve this problem, previous work used deep learning models -specifically, sequence-to-sequence models- were used to automate that task. This model delivered promising results on translating code differences to commit messages. However, after the model's performance was thoroughly investigated in previous work. It was found out that code differences corresponding to almost every high quality commit messages generated by the model were very similar to one or more training sample code differences on a token level. Motivated by that observation, a k-nearest neighbor algorithm that outputs the same exact message of the nearest code difference was proposed in previous work. Inspired by the traditional solution to sequence modeling; Hidden Markov Models, we show that HMMs outperforms sequence-to-sequence models without outputting the same exact message of the nearest code diff, our experiments show an enhancement of 4% against sequence to sequence models.","PeriodicalId":172646,"journal":{"name":"Proceedings of the 8th International Conference on Software and Information Engineering","volume":"25 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-04-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"Commit Message Generation from Code Differences using Hidden Markov Models\",\"authors\":\"Ahmed Awad, K. Nagaty\",\"doi\":\"10.1145/3328833.3328873\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Commit messages are developer-written messages that document code changes. Such change might be adding features, fixing bugs or simply code updates. Although these messages help in understanding the evolution of any software, it is quite often that developers disregard the process of writing these messages, when making a change. Many automated methods have been proposed to generate commit messages. Due to the inability of those techniques to represent higher order understanding of code changes, the quality of these messages in terms of logic and context representation is very low as opposed to developer written messages. To solve this problem, previous work used deep learning models -specifically, sequence-to-sequence models- were used to automate that task. This model delivered promising results on translating code differences to commit messages. However, after the model's performance was thoroughly investigated in previous work. It was found out that code differences corresponding to almost every high quality commit messages generated by the model were very similar to one or more training sample code differences on a token level. Motivated by that observation, a k-nearest neighbor algorithm that outputs the same exact message of the nearest code difference was proposed in previous work. Inspired by the traditional solution to sequence modeling; Hidden Markov Models, we show that HMMs outperforms sequence-to-sequence models without outputting the same exact message of the nearest code diff, our experiments show an enhancement of 4% against sequence to sequence models.\",\"PeriodicalId\":172646,\"journal\":{\"name\":\"Proceedings of the 8th International Conference on Software and Information Engineering\",\"volume\":\"25 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2019-04-09\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the 8th International Conference on Software and Information Engineering\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/3328833.3328873\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 8th International Conference on Software and Information Engineering","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3328833.3328873","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Commit Message Generation from Code Differences using Hidden Markov Models
Commit messages are developer-written messages that document code changes. Such change might be adding features, fixing bugs or simply code updates. Although these messages help in understanding the evolution of any software, it is quite often that developers disregard the process of writing these messages, when making a change. Many automated methods have been proposed to generate commit messages. Due to the inability of those techniques to represent higher order understanding of code changes, the quality of these messages in terms of logic and context representation is very low as opposed to developer written messages. To solve this problem, previous work used deep learning models -specifically, sequence-to-sequence models- were used to automate that task. This model delivered promising results on translating code differences to commit messages. However, after the model's performance was thoroughly investigated in previous work. It was found out that code differences corresponding to almost every high quality commit messages generated by the model were very similar to one or more training sample code differences on a token level. Motivated by that observation, a k-nearest neighbor algorithm that outputs the same exact message of the nearest code difference was proposed in previous work. Inspired by the traditional solution to sequence modeling; Hidden Markov Models, we show that HMMs outperforms sequence-to-sequence models without outputting the same exact message of the nearest code diff, our experiments show an enhancement of 4% against sequence to sequence models.