A Learning-Based Bug Predicition Method for Object-Oriented Systems

2018 IEEE/ACIS 17th International Conference on Computer and Information Science (ICIS) Pub Date : 2018-06-01 DOI:10.1109/ICIS.2018.8466535

Fikret Aktas, F. Buzluca

{"title":"A Learning-Based Bug Predicition Method for Object-Oriented Systems","authors":"Fikret Aktas, F. Buzluca","doi":"10.1109/ICIS.2018.8466535","DOIUrl":null,"url":null,"abstract":"Because of the increase in size and complexity of todays advanced software systems; the number of structural defective software classes in projects also increases, when necessary precautions are not taken. In this study, we purpose a machine-learning-based approach to detect defective classes, which generate most of the errors in the tests. Our objective is helping software developers and testers to predict error-prone classes, eliminate design defects and reduce testing costs. In learning-based methods, the dataset that is used for training the model, strongly affects the accuracy of the detection system. Therefore, we focus on steps of constructing the proper dataset using different metrics collected from existing software projects. First, we consider the rate of errors generated by a class to label it as \"Clean\" or \"Buggy\". Secondly, we use CFS (Correlation-based Feature Selection) and the PCA (Principal Component Analysis) methods to obtain the most appropriate subset of metrics. This feature selection process increases the understandability and the detection performance of the model. Lastly, we apply the Random Forest classification method to determine error-prone classes. We evaluated our approach using five different datasets that include data collected from various open-source Eclipse subprojects. The results show that our approach is successful in building learning-based models for detecting error-prone classes. However, we also observed that different models should be created for different software systems, because each project has its own character.","PeriodicalId":447019,"journal":{"name":"2018 IEEE/ACIS 17th International Conference on Computer and Information Science (ICIS)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2018-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"7","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2018 IEEE/ACIS 17th International Conference on Computer and Information Science (ICIS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICIS.2018.8466535","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 7

Abstract

Because of the increase in size and complexity of todays advanced software systems; the number of structural defective software classes in projects also increases, when necessary precautions are not taken. In this study, we purpose a machine-learning-based approach to detect defective classes, which generate most of the errors in the tests. Our objective is helping software developers and testers to predict error-prone classes, eliminate design defects and reduce testing costs. In learning-based methods, the dataset that is used for training the model, strongly affects the accuracy of the detection system. Therefore, we focus on steps of constructing the proper dataset using different metrics collected from existing software projects. First, we consider the rate of errors generated by a class to label it as "Clean" or "Buggy". Secondly, we use CFS (Correlation-based Feature Selection) and the PCA (Principal Component Analysis) methods to obtain the most appropriate subset of metrics. This feature selection process increases the understandability and the detection performance of the model. Lastly, we apply the Random Forest classification method to determine error-prone classes. We evaluated our approach using five different datasets that include data collected from various open-source Eclipse subprojects. The results show that our approach is successful in building learning-based models for detecting error-prone classes. However, we also observed that different models should be created for different software systems, because each project has its own character.

查看原文本刊更多论文

面向对象系统的基于学习的Bug预测方法

由于当今先进软件系统的规模和复杂性的增加;如果不采取必要的预防措施，项目中存在结构性缺陷的软件类的数量也会增加。在这项研究中，我们使用了一种基于机器学习的方法来检测缺陷类，这些缺陷类在测试中产生了大部分错误。我们的目标是帮助软件开发人员和测试人员预测容易出错的类，消除设计缺陷并降低测试成本。在基于学习的方法中，用于训练模型的数据集强烈影响检测系统的准确性。因此，我们将重点放在使用从现有软件项目中收集的不同度量来构建适当数据集的步骤上。首先，我们考虑一个类产生的错误率，将其标记为“干净”或“有bug”。其次，我们使用CFS (Correlation-based Feature Selection)和PCA (Principal Component Analysis)方法来获得最合适的度量子集。这种特征选择过程提高了模型的可理解性和检测性能。最后，我们应用随机森林分类方法来确定容易出错的类别。我们使用五个不同的数据集来评估我们的方法，这些数据集包括从各种开源Eclipse子项目收集的数据。结果表明，我们的方法在构建基于学习的模型以检测容易出错的类方面是成功的。然而，我们也注意到应该为不同的软件系统创建不同的模型，因为每个项目都有自己的特点。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2018 IEEE/ACIS 17th International Conference on Computer and Information Science (ICIS)

自引率

0.00%

发文量