软件故障预测的两阶段数据预处理方法

2014 Eighth International Conference on Software Security and Reliability Pub Date : 2014-06-30 DOI:10.1109/SERE.2014.15

Jiaqiang Chen, Shulong Liu, Wangshu Liu, Xiang Chen, Qing Gu, Daoxu Chen

{"title":"软件故障预测的两阶段数据预处理方法","authors":"Jiaqiang Chen, Shulong Liu, Wangshu Liu, Xiang Chen, Qing Gu, Daoxu Chen","doi":"10.1109/SERE.2014.15","DOIUrl":null,"url":null,"abstract":"Software fault prediction is valuable in predicting fault proneness of software modules and then limited test resources can be effectively allocated for software quality assurance. Researchers have proved that either feature selection or instance reduction can improve the performance of classification models used for fault prediction. However, to the best of our knowledge, few researchers have combined them to study the effects on classification models. Therefore we propose a novel two-stage data preprocessing approach, which incorporates both feature selection and instance reduction. In particular, in the feature selection stage, we propose a new algorithm using both feature selection and threshold-based clustering which contains both relevance analysis and redundancy control. Then in the instance reduction stage, we apply random sampling to keep the balance between the faulty and non-faulty classes. In empirical studies, we implemented five different data preprocessing schemes based on our proposed approach, and performed a comparative study on the prediction performance of the commonly used classification models. The final results demonstrate the effectiveness of our approach and further provide a guideline for achieving cost-effective data preprocessing when using our approach.","PeriodicalId":248957,"journal":{"name":"2014 Eighth International Conference on Software Security and Reliability","volume":"196 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2014-06-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"47","resultStr":"{\"title\":\"A Two-Stage Data Preprocessing Approach for Software Fault Prediction\",\"authors\":\"Jiaqiang Chen, Shulong Liu, Wangshu Liu, Xiang Chen, Qing Gu, Daoxu Chen\",\"doi\":\"10.1109/SERE.2014.15\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Software fault prediction is valuable in predicting fault proneness of software modules and then limited test resources can be effectively allocated for software quality assurance. Researchers have proved that either feature selection or instance reduction can improve the performance of classification models used for fault prediction. However, to the best of our knowledge, few researchers have combined them to study the effects on classification models. Therefore we propose a novel two-stage data preprocessing approach, which incorporates both feature selection and instance reduction. In particular, in the feature selection stage, we propose a new algorithm using both feature selection and threshold-based clustering which contains both relevance analysis and redundancy control. Then in the instance reduction stage, we apply random sampling to keep the balance between the faulty and non-faulty classes. In empirical studies, we implemented five different data preprocessing schemes based on our proposed approach, and performed a comparative study on the prediction performance of the commonly used classification models. The final results demonstrate the effectiveness of our approach and further provide a guideline for achieving cost-effective data preprocessing when using our approach.\",\"PeriodicalId\":248957,\"journal\":{\"name\":\"2014 Eighth International Conference on Software Security and Reliability\",\"volume\":\"196 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2014-06-30\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"47\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2014 Eighth International Conference on Software Security and Reliability\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/SERE.2014.15\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2014 Eighth International Conference on Software Security and Reliability","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/SERE.2014.15","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 47

摘要

软件故障预测在预测软件模块的故障倾向方面具有重要的价值，从而有效地分配有限的测试资源，保证软件的质量。研究人员已经证明，无论是特征选择还是实例约简都可以提高用于故障预测的分类模型的性能。然而，据我们所知，很少有研究人员将它们结合起来研究对分类模型的影响。因此，我们提出了一种新的两阶段数据预处理方法，该方法结合了特征选择和实例约简。特别是在特征选择阶段，我们提出了一种结合特征选择和基于阈值的聚类的新算法，该算法同时包含相关性分析和冗余控制。然后在实例缩减阶段，我们采用随机抽样来保持故障类和非故障类之间的平衡。在实证研究中，我们基于本文提出的方法实现了五种不同的数据预处理方案，并对常用分类模型的预测性能进行了比较研究。最后的结果证明了我们方法的有效性，并进一步为使用我们的方法实现具有成本效益的数据预处理提供了指导。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

A Two-Stage Data Preprocessing Approach for Software Fault Prediction

Software fault prediction is valuable in predicting fault proneness of software modules and then limited test resources can be effectively allocated for software quality assurance. Researchers have proved that either feature selection or instance reduction can improve the performance of classification models used for fault prediction. However, to the best of our knowledge, few researchers have combined them to study the effects on classification models. Therefore we propose a novel two-stage data preprocessing approach, which incorporates both feature selection and instance reduction. In particular, in the feature selection stage, we propose a new algorithm using both feature selection and threshold-based clustering which contains both relevance analysis and redundancy control. Then in the instance reduction stage, we apply random sampling to keep the balance between the faulty and non-faulty classes. In empirical studies, we implemented five different data preprocessing schemes based on our proposed approach, and performed a comparative study on the prediction performance of the commonly used classification models. The final results demonstrate the effectiveness of our approach and further provide a guideline for achieving cost-effective data preprocessing when using our approach.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2014 Eighth International Conference on Software Security and Reliability

自引率

0.00%

发文量