{"title":"Software change classification using hunk metrics","authors":"Javed Ferzund, S. Ahsan, F. Wotawa","doi":"10.1109/ICSM.2009.5306274","DOIUrl":null,"url":null,"abstract":"Change management is a challenging task in software maintenance. Changes are made to the software during its whole life. Some of these changes introduce errors in the code which result in failures. Software changes are composed of small code units called hunks, dispersed in source code files. In this paper we present a technique for classifying software changes based on hunk metrics. We classify individual hunks as buggy or bug-free, thus we provide an approach for bug prediction at the smallest level of granularity. We introduce a set of hunk metrics and build classification models based on these metrics. Classification models are built using logistic regression and random forests. We evaluated the performance of our approach on 7 open source software projects. Our classification approach can classify hunks as buggy or bug free with 81 percent accuracy, 77 percent buggy hunk precision and 67 percent buggy hunk recall on average. Most of the hunk metrics are significant predictors of bugs but the set of significant metrics varies among different projects.","PeriodicalId":247441,"journal":{"name":"2009 IEEE International Conference on Software Maintenance","volume":"67 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2009-10-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"8","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2009 IEEE International Conference on Software Maintenance","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICSM.2009.5306274","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 8
Abstract
Change management is a challenging task in software maintenance. Changes are made to the software during its whole life. Some of these changes introduce errors in the code which result in failures. Software changes are composed of small code units called hunks, dispersed in source code files. In this paper we present a technique for classifying software changes based on hunk metrics. We classify individual hunks as buggy or bug-free, thus we provide an approach for bug prediction at the smallest level of granularity. We introduce a set of hunk metrics and build classification models based on these metrics. Classification models are built using logistic regression and random forests. We evaluated the performance of our approach on 7 open source software projects. Our classification approach can classify hunks as buggy or bug free with 81 percent accuracy, 77 percent buggy hunk precision and 67 percent buggy hunk recall on average. Most of the hunk metrics are significant predictors of bugs but the set of significant metrics varies among different projects.