Sebastian Hönel, Morgan Ericsson, Welf Löwe, Anna Wingkvist
{"title":"Importance and Aptitude of Source Code Density for Commit Classification into Maintenance Activities","authors":"Sebastian Hönel, Morgan Ericsson, Welf Löwe, Anna Wingkvist","doi":"10.1109/QRS.2019.00027","DOIUrl":null,"url":null,"abstract":"Commit classification, the automatic classification of the purpose of changes to software, can support the understanding and quality improvement of software and its development process. We introduce code density of a commit, a measure of the net size of a commit, as a novel feature and study how well it is suited to determine the purpose of a change. We also compare the accuracy of code-density-based classifications with existing size-based classifications. By applying standard classification models, we demonstrate the significance of code density for the accuracy of commit classification. We achieve up to 89% accuracy and a Kappa of 0.82 for the cross-project commit classification where the model is trained on one project and applied to other projects. Such highly accurate classification of the purpose of software changes helps to improve the confidence in software (process) quality analyses exploiting this classification information.","PeriodicalId":122665,"journal":{"name":"2019 IEEE 19th International Conference on Software Quality, Reliability and Security (QRS)","volume":"54 6 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"14","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2019 IEEE 19th International Conference on Software Quality, Reliability and Security (QRS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/QRS.2019.00027","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 14
Abstract
Commit classification, the automatic classification of the purpose of changes to software, can support the understanding and quality improvement of software and its development process. We introduce code density of a commit, a measure of the net size of a commit, as a novel feature and study how well it is suited to determine the purpose of a change. We also compare the accuracy of code-density-based classifications with existing size-based classifications. By applying standard classification models, we demonstrate the significance of code density for the accuracy of commit classification. We achieve up to 89% accuracy and a Kappa of 0.82 for the cross-project commit classification where the model is trained on one project and applied to other projects. Such highly accurate classification of the purpose of software changes helps to improve the confidence in software (process) quality analyses exploiting this classification information.