{"title":"A Learning-Based Bug Predicition Method for Object-Oriented Systems","authors":"Fikret Aktas, F. Buzluca","doi":"10.1109/ICIS.2018.8466535","DOIUrl":null,"url":null,"abstract":"Because of the increase in size and complexity of todays advanced software systems; the number of structural defective software classes in projects also increases, when necessary precautions are not taken. In this study, we purpose a machine-learning-based approach to detect defective classes, which generate most of the errors in the tests. Our objective is helping software developers and testers to predict error-prone classes, eliminate design defects and reduce testing costs. In learning-based methods, the dataset that is used for training the model, strongly affects the accuracy of the detection system. Therefore, we focus on steps of constructing the proper dataset using different metrics collected from existing software projects. First, we consider the rate of errors generated by a class to label it as \"Clean\" or \"Buggy\". Secondly, we use CFS (Correlation-based Feature Selection) and the PCA (Principal Component Analysis) methods to obtain the most appropriate subset of metrics. This feature selection process increases the understandability and the detection performance of the model. Lastly, we apply the Random Forest classification method to determine error-prone classes. We evaluated our approach using five different datasets that include data collected from various open-source Eclipse subprojects. The results show that our approach is successful in building learning-based models for detecting error-prone classes. However, we also observed that different models should be created for different software systems, because each project has its own character.","PeriodicalId":447019,"journal":{"name":"2018 IEEE/ACIS 17th International Conference on Computer and Information Science (ICIS)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2018-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"7","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2018 IEEE/ACIS 17th International Conference on Computer and Information Science (ICIS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICIS.2018.8466535","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 7
Abstract
Because of the increase in size and complexity of todays advanced software systems; the number of structural defective software classes in projects also increases, when necessary precautions are not taken. In this study, we purpose a machine-learning-based approach to detect defective classes, which generate most of the errors in the tests. Our objective is helping software developers and testers to predict error-prone classes, eliminate design defects and reduce testing costs. In learning-based methods, the dataset that is used for training the model, strongly affects the accuracy of the detection system. Therefore, we focus on steps of constructing the proper dataset using different metrics collected from existing software projects. First, we consider the rate of errors generated by a class to label it as "Clean" or "Buggy". Secondly, we use CFS (Correlation-based Feature Selection) and the PCA (Principal Component Analysis) methods to obtain the most appropriate subset of metrics. This feature selection process increases the understandability and the detection performance of the model. Lastly, we apply the Random Forest classification method to determine error-prone classes. We evaluated our approach using five different datasets that include data collected from various open-source Eclipse subprojects. The results show that our approach is successful in building learning-based models for detecting error-prone classes. However, we also observed that different models should be created for different software systems, because each project has its own character.