{"title":"Determining the Most Significant Metadata Features to Indicate Defective Software Commits","authors":"Rupam Dey, Anahita Khojandi, K. Perumalla","doi":"10.1109/SERA57763.2023.10197721","DOIUrl":null,"url":null,"abstract":"Defects are largely inevitable in the software development life cycle. Since we cannot avoid them during the development process, we can only desire to fight back with our limited resources in terms of time and monetary investment. Like in many other fields, machine learning models can be of help to mitigate the problem of defects by predicting both bug frequency and defective modules at different granularity levels. However, machine learning models are as good as the quality of the pre-selected set of features under consideration. Therefore, importance must be given while selecting only the necessary features from the original set of features. In this study, we compared various machine learning models with varying feature selection techniques and found the superiority of random forest-based machine learning techniques with wrapper methods. Random forest-based models with the wrapper method were able to detect all the buggy classes successfully on the validation data set.","PeriodicalId":211080,"journal":{"name":"2023 IEEE/ACIS 21st International Conference on Software Engineering Research, Management and Applications (SERA)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-05-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2023 IEEE/ACIS 21st International Conference on Software Engineering Research, Management and Applications (SERA)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/SERA57763.2023.10197721","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Defects are largely inevitable in the software development life cycle. Since we cannot avoid them during the development process, we can only desire to fight back with our limited resources in terms of time and monetary investment. Like in many other fields, machine learning models can be of help to mitigate the problem of defects by predicting both bug frequency and defective modules at different granularity levels. However, machine learning models are as good as the quality of the pre-selected set of features under consideration. Therefore, importance must be given while selecting only the necessary features from the original set of features. In this study, we compared various machine learning models with varying feature selection techniques and found the superiority of random forest-based machine learning techniques with wrapper methods. Random forest-based models with the wrapper method were able to detect all the buggy classes successfully on the validation data set.