Ensemble Machine Learning Model for Software Defect Prediction

Advances in Machine Learning & Artificial Intelligence Pub Date : 1900-01-01 DOI:10.33140/amlai.02.01.03

{"title":"Ensemble Machine Learning Model for Software Defect Prediction","authors":"","doi":"10.33140/amlai.02.01.03","DOIUrl":null,"url":null,"abstract":"Software defect prediction is a significant activity in every software firm. It helps in producing quality software by reliable defect prediction, defect elimination, and prediction of modules that are susceptible to defect. Several researchers have proposed different software prediction approaches in the past. However, these conventional software defect predictions are prone to low classification accuracy, time-consuming, and tasking. This paper aims to develop a novel multi-model ensemble machine-learning for software defect prediction. The ensemble technique can reduce inconsistency among training and test datasets and eliminate bias in the training and testing phase of the model, thereby overcoming the downsides that have characterized the existing techniques used for the prediction of a software defect. To address these shortcomings, this paper proposes a new ensemble machine-learning model for software defect prediction using k Nearest Neighbour (kNN), Generalized Linear Model with Elastic Net Regularization (GLMNet), and Linear Discriminant Analysis (LDA) with Random Forest as base learner. Experiments were conducted using the proposed model on CM1, JM1, KC3, and PC3 datasets from the NASA PROMISE repository using the RStudio simulation tool. The ensemble technique achieved 87.69% for CM1 dataset, 81.11% for JM1 dataset, 90.70% for PC3 dataset, and 94.74% for KC3 dataset. The performance of the proposed system was compared with that of other existing techniques in literature in terms of AUC. The ensemble technique achieved 87%, which is better than the other seven state-of-the-art techniques under consideration. On average, the proposed model achieved an overall prediction accuracy of 88.56% for all datasets used for experiments. The results demonstrated that the ensemble model succeeded in effectively predicting the defects in PROMISE datasets that are notorious for their noisy features and high dimensions. This shows that ensemble machine learning is promising and the future of software defect prediction.","PeriodicalId":377073,"journal":{"name":"Advances in Machine Learning & Artificial Intelligence","volume":"67 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"10","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Advances in Machine Learning & Artificial Intelligence","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.33140/amlai.02.01.03","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 10

Abstract

Software defect prediction is a significant activity in every software firm. It helps in producing quality software by reliable defect prediction, defect elimination, and prediction of modules that are susceptible to defect. Several researchers have proposed different software prediction approaches in the past. However, these conventional software defect predictions are prone to low classification accuracy, time-consuming, and tasking. This paper aims to develop a novel multi-model ensemble machine-learning for software defect prediction. The ensemble technique can reduce inconsistency among training and test datasets and eliminate bias in the training and testing phase of the model, thereby overcoming the downsides that have characterized the existing techniques used for the prediction of a software defect. To address these shortcomings, this paper proposes a new ensemble machine-learning model for software defect prediction using k Nearest Neighbour (kNN), Generalized Linear Model with Elastic Net Regularization (GLMNet), and Linear Discriminant Analysis (LDA) with Random Forest as base learner. Experiments were conducted using the proposed model on CM1, JM1, KC3, and PC3 datasets from the NASA PROMISE repository using the RStudio simulation tool. The ensemble technique achieved 87.69% for CM1 dataset, 81.11% for JM1 dataset, 90.70% for PC3 dataset, and 94.74% for KC3 dataset. The performance of the proposed system was compared with that of other existing techniques in literature in terms of AUC. The ensemble technique achieved 87%, which is better than the other seven state-of-the-art techniques under consideration. On average, the proposed model achieved an overall prediction accuracy of 88.56% for all datasets used for experiments. The results demonstrated that the ensemble model succeeded in effectively predicting the defects in PROMISE datasets that are notorious for their noisy features and high dimensions. This shows that ensemble machine learning is promising and the future of software defect prediction.

查看原文本刊更多论文

软件缺陷预测的集成机器学习模型

软件缺陷预测是每个软件公司的重要活动。它通过可靠的缺陷预测、缺陷消除和易受缺陷影响的模块预测来帮助生产高质量的软件。过去，几位研究人员提出了不同的软件预测方法。然而，这些传统的软件缺陷预测倾向于低分类准确性、耗时和任务。本文旨在开发一种用于软件缺陷预测的新型多模型集成机器学习方法。集成技术可以减少训练和测试数据集之间的不一致，并消除模型训练和测试阶段的偏差，从而克服用于预测软件缺陷的现有技术所具有的缺点。为了解决这些缺点，本文提出了一种新的集成机器学习模型，用于软件缺陷预测，该模型使用k近邻(kNN)、弹性网络正则化广义线性模型(GLMNet)和随机森林作为基础学习器的线性判别分析(LDA)。利用RStudio仿真工具在NASA PROMISE数据库中的CM1、JM1、KC3和PC3数据集上进行了实验。集成技术在CM1、JM1、PC3和KC3数据集上分别达到87.69%、81.11%、90.70%和94.74%。在AUC方面，将该系统的性能与文献中其他现有技术的性能进行了比较。合奏技术达到了87%，比其他七种最先进的技术要好。平均而言，该模型对所有实验数据集的总体预测准确率为88.56%。结果表明，该集成模型成功地预测了PROMISE数据集中因噪声特征和高维而臭名昭著的缺陷。这表明集成机器学习是有前途的，也是软件缺陷预测的未来。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Advances in Machine Learning & Artificial Intelligence

自引率

0.00%

发文量