{"title":"Framework to improve software effort estimation accuracy using novel ensemble rule","authors":"Syed Sarmad Ali , Jian Ren , Ji Wu","doi":"10.1016/j.jksuci.2024.102189","DOIUrl":null,"url":null,"abstract":"<div><div>This investigation focuses on refining software effort estimation (SEE) to enhance project outcomes amidst the rapid evolution of the software industry. Accurate estimation is a cornerstone of project success, crucial for avoiding budget overruns and minimizing the risk of project failures. The framework proposed in this article addresses three significant issues that are critical for accurate estimation: dealing with missing or inadequate data, selecting key features, and improving the software effort model. Our proposed framework incorporates three methods: the <em>Novel Incomplete Value Imputation Model (NIVIM)</em>, a hybrid model using <em>Correlation-based Feature Selection with a meta-heuristic algorithm (CFS-Meta)</em>, and the <em>Heterogeneous Ensemble Model (HEM)</em>. The combined framework synergistically enhances the robustness and accuracy of SEE by effectively handling missing data, optimizing feature selection, and integrating diverse predictive models for superior performance across varying project scenarios. The framework significantly reduces imputation and feature selection overhead, while the ensemble approach optimizes model performance through dynamic weighting and meta-learning. This results in lower mean absolute error (MAE) and reduced computational complexity, making it more effective for diverse software datasets. NIVIM is engineered to address incomplete datasets prevalent in SEE. By integrating a synthetic data methodology through a Variational Auto-Encoder (VAE), the model incorporates both contextual relevance and intrinsic project features, significantly enhancing estimation precision. Comparative analyses reveal that NIVIM surpasses existing models such as VAE, GAIN, K-NN, and MICE, achieving statistically significant improvements across six benchmark datasets, with average RMSE improvements ranging from <em>11.05%</em> to <em>17.72%</em> and MAE improvements from <em>9.62%</em> to <em>21.96%</em>. Our proposed method, CFS-Meta, balances global optimization with local search techniques, substantially enhancing predictive capabilities. The proposed CFS-Meta model was compared to single and hybrid feature selection models to assess its efficiency, demonstrating up to a <em>25.61%</em> reduction in MSE. Additionally, the proposed CFS-Meta achieves a <em>10%</em> (MAE) improvement against the hybrid PSO-SA model, an <em>11.38%</em> (MAE) improvement compared to the Hybrid ABC-SA model, and <em>12.42%</em> and <em>12.703%</em> (MAE) improvements compared to the hybrid Tabu-GA and hybrid ACO-COA models, respectively. Our third method proposes an ensemble effort estimation (EEE) model that amalgamates diverse standalone models through a Dynamic Weight Adjustment-stacked combination (DWSC) rule. Tested against international benchmarks and industry datasets, the HEM method has improved the standalone model by an average of <em>21.8%</em> (Pred()) and the homogeneous ensemble model by <em>15%</em> (Pred()). This comprehensive methodology underscores our model’s contributions to advancing software project management (SPM) through advanced predictive modeling, setting a new benchmark for software engineering effort estimation.</div></div>","PeriodicalId":48547,"journal":{"name":"Journal of King Saud University-Computer and Information Sciences","volume":null,"pages":null},"PeriodicalIF":5.2000,"publicationDate":"2024-09-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of King Saud University-Computer and Information Sciences","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S1319157824002787","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}
引用次数: 0
Abstract
This investigation focuses on refining software effort estimation (SEE) to enhance project outcomes amidst the rapid evolution of the software industry. Accurate estimation is a cornerstone of project success, crucial for avoiding budget overruns and minimizing the risk of project failures. The framework proposed in this article addresses three significant issues that are critical for accurate estimation: dealing with missing or inadequate data, selecting key features, and improving the software effort model. Our proposed framework incorporates three methods: the Novel Incomplete Value Imputation Model (NIVIM), a hybrid model using Correlation-based Feature Selection with a meta-heuristic algorithm (CFS-Meta), and the Heterogeneous Ensemble Model (HEM). The combined framework synergistically enhances the robustness and accuracy of SEE by effectively handling missing data, optimizing feature selection, and integrating diverse predictive models for superior performance across varying project scenarios. The framework significantly reduces imputation and feature selection overhead, while the ensemble approach optimizes model performance through dynamic weighting and meta-learning. This results in lower mean absolute error (MAE) and reduced computational complexity, making it more effective for diverse software datasets. NIVIM is engineered to address incomplete datasets prevalent in SEE. By integrating a synthetic data methodology through a Variational Auto-Encoder (VAE), the model incorporates both contextual relevance and intrinsic project features, significantly enhancing estimation precision. Comparative analyses reveal that NIVIM surpasses existing models such as VAE, GAIN, K-NN, and MICE, achieving statistically significant improvements across six benchmark datasets, with average RMSE improvements ranging from 11.05% to 17.72% and MAE improvements from 9.62% to 21.96%. Our proposed method, CFS-Meta, balances global optimization with local search techniques, substantially enhancing predictive capabilities. The proposed CFS-Meta model was compared to single and hybrid feature selection models to assess its efficiency, demonstrating up to a 25.61% reduction in MSE. Additionally, the proposed CFS-Meta achieves a 10% (MAE) improvement against the hybrid PSO-SA model, an 11.38% (MAE) improvement compared to the Hybrid ABC-SA model, and 12.42% and 12.703% (MAE) improvements compared to the hybrid Tabu-GA and hybrid ACO-COA models, respectively. Our third method proposes an ensemble effort estimation (EEE) model that amalgamates diverse standalone models through a Dynamic Weight Adjustment-stacked combination (DWSC) rule. Tested against international benchmarks and industry datasets, the HEM method has improved the standalone model by an average of 21.8% (Pred()) and the homogeneous ensemble model by 15% (Pred()). This comprehensive methodology underscores our model’s contributions to advancing software project management (SPM) through advanced predictive modeling, setting a new benchmark for software engineering effort estimation.
期刊介绍:
In 2022 the Journal of King Saud University - Computer and Information Sciences will become an author paid open access journal. Authors who submit their manuscript after October 31st 2021 will be asked to pay an Article Processing Charge (APC) after acceptance of their paper to make their work immediately, permanently, and freely accessible to all. The Journal of King Saud University Computer and Information Sciences is a refereed, international journal that covers all aspects of both foundations of computer and its practical applications.