Framework to improve software effort estimation accuracy using novel ensemble rule

IF 5.2 2区计算机科学 Q1 COMPUTER SCIENCE, INFORMATION SYSTEMS

Journal of King Saud University-Computer and Information Sciences Pub Date : 2024-09-20 DOI:10.1016/j.jksuci.2024.102189

Syed Sarmad Ali , Jian Ren , Ji Wu

{"title":"Framework to improve software effort estimation accuracy using novel ensemble rule","authors":"Syed Sarmad Ali , Jian Ren , Ji Wu","doi":"10.1016/j.jksuci.2024.102189","DOIUrl":null,"url":null,"abstract":"<div><div>This investigation focuses on refining software effort estimation (SEE) to enhance project outcomes amidst the rapid evolution of the software industry. Accurate estimation is a cornerstone of project success, crucial for avoiding budget overruns and minimizing the risk of project failures. The framework proposed in this article addresses three significant issues that are critical for accurate estimation: dealing with missing or inadequate data, selecting key features, and improving the software effort model. Our proposed framework incorporates three methods: the Novel Incomplete Value Imputation Model (NIVIM), a hybrid model using Correlation-based Feature Selection with a meta-heuristic algorithm (CFS-Meta), and the Heterogeneous Ensemble Model (HEM). The combined framework synergistically enhances the robustness and accuracy of SEE by effectively handling missing data, optimizing feature selection, and integrating diverse predictive models for superior performance across varying project scenarios. The framework significantly reduces imputation and feature selection overhead, while the ensemble approach optimizes model performance through dynamic weighting and meta-learning. This results in lower mean absolute error (MAE) and reduced computational complexity, making it more effective for diverse software datasets. NIVIM is engineered to address incomplete datasets prevalent in SEE. By integrating a synthetic data methodology through a Variational Auto-Encoder (VAE), the model incorporates both contextual relevance and intrinsic project features, significantly enhancing estimation precision. Comparative analyses reveal that NIVIM surpasses existing models such as VAE, GAIN, K-NN, and MICE, achieving statistically significant improvements across six benchmark datasets, with average RMSE improvements ranging from 11.05% to 17.72% and MAE improvements from 9.62% to 21.96%. Our proposed method, CFS-Meta, balances global optimization with local search techniques, substantially enhancing predictive capabilities. The proposed CFS-Meta model was compared to single and hybrid feature selection models to assess its efficiency, demonstrating up to a 25.61% reduction in MSE. Additionally, the proposed CFS-Meta achieves a 10% (MAE) improvement against the hybrid PSO-SA model, an 11.38% (MAE) improvement compared to the Hybrid ABC-SA model, and 12.42% and 12.703% (MAE) improvements compared to the hybrid Tabu-GA and hybrid ACO-COA models, respectively. Our third method proposes an ensemble effort estimation (EEE) model that amalgamates diverse standalone models through a Dynamic Weight Adjustment-stacked combination (DWSC) rule. Tested against international benchmarks and industry datasets, the HEM method has improved the standalone model by an average of 21.8% (Pred()) and the homogeneous ensemble model by 15% (Pred()). This comprehensive methodology underscores our model’s contributions to advancing software project management (SPM) through advanced predictive modeling, setting a new benchmark for software engineering effort estimation.</div></div>","PeriodicalId":48547,"journal":{"name":"Journal of King Saud University-Computer and Information Sciences","volume":"36 9","pages":"Article 102189"},"PeriodicalIF":5.2000,"publicationDate":"2024-09-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of King Saud University-Computer and Information Sciences","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S1319157824002787","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}

引用次数: 0

Abstract

This investigation focuses on refining software effort estimation (SEE) to enhance project outcomes amidst the rapid evolution of the software industry. Accurate estimation is a cornerstone of project success, crucial for avoiding budget overruns and minimizing the risk of project failures. The framework proposed in this article addresses three significant issues that are critical for accurate estimation: dealing with missing or inadequate data, selecting key features, and improving the software effort model. Our proposed framework incorporates three methods: the Novel Incomplete Value Imputation Model (NIVIM), a hybrid model using Correlation-based Feature Selection with a meta-heuristic algorithm (CFS-Meta), and the Heterogeneous Ensemble Model (HEM). The combined framework synergistically enhances the robustness and accuracy of SEE by effectively handling missing data, optimizing feature selection, and integrating diverse predictive models for superior performance across varying project scenarios. The framework significantly reduces imputation and feature selection overhead, while the ensemble approach optimizes model performance through dynamic weighting and meta-learning. This results in lower mean absolute error (MAE) and reduced computational complexity, making it more effective for diverse software datasets. NIVIM is engineered to address incomplete datasets prevalent in SEE. By integrating a synthetic data methodology through a Variational Auto-Encoder (VAE), the model incorporates both contextual relevance and intrinsic project features, significantly enhancing estimation precision. Comparative analyses reveal that NIVIM surpasses existing models such as VAE, GAIN, K-NN, and MICE, achieving statistically significant improvements across six benchmark datasets, with average RMSE improvements ranging from 11.05% to 17.72% and MAE improvements from 9.62% to 21.96%. Our proposed method, CFS-Meta, balances global optimization with local search techniques, substantially enhancing predictive capabilities. The proposed CFS-Meta model was compared to single and hybrid feature selection models to assess its efficiency, demonstrating up to a 25.61% reduction in MSE. Additionally, the proposed CFS-Meta achieves a 10% (MAE) improvement against the hybrid PSO-SA model, an 11.38% (MAE) improvement compared to the Hybrid ABC-SA model, and 12.42% and 12.703% (MAE) improvements compared to the hybrid Tabu-GA and hybrid ACO-COA models, respectively. Our third method proposes an ensemble effort estimation (EEE) model that amalgamates diverse standalone models through a Dynamic Weight Adjustment-stacked combination (DWSC) rule. Tested against international benchmarks and industry datasets, the HEM method has improved the standalone model by an average of 21.8% (Pred()) and the homogeneous ensemble model by 15% (Pred()). This comprehensive methodology underscores our model’s contributions to advancing software project management (SPM) through advanced predictive modeling, setting a new benchmark for software engineering effort estimation.

查看原文本刊更多论文

利用新型集合规则提高软件工作量估算准确性的框架

这项研究的重点是改进软件工作量估算（SEE），以便在软件行业快速发展的过程中提高项目成果。准确估算是项目成功的基石，对于避免预算超支和最大限度降低项目失败风险至关重要。本文提出的框架解决了对准确估算至关重要的三个重要问题：处理缺失或不充分的数据、选择关键功能以及改进软件工作量模型。我们提出的框架包含三种方法：新颖的不完整值估算模型（NIVIM）、使用元启发式算法（CFS-Meta）的基于相关性特征选择的混合模型以及异构集合模型（HEM）。组合框架通过有效处理缺失数据、优化特征选择和整合不同的预测模型，在不同的项目场景中实现卓越性能，从而协同提高 SEE 的稳健性和准确性。该框架大大减少了估算和特征选择的开销，而集合方法则通过动态加权和元学习优化了模型性能。这就降低了平均绝对误差（MAE），减少了计算复杂性，使其对各种软件数据集更加有效。NIVIM 专为解决 SEE 中普遍存在的不完整数据集而设计。通过变异自动编码器（VAE）整合合成数据方法，该模型结合了上下文相关性和项目固有特征，显著提高了估算精度。对比分析表明，NIVIM 超越了 VAE、GAIN、K-NN 和 MICE 等现有模型，在六个基准数据集上实现了统计意义上的显著改进，平均 RMSE 提高了 11.05% 到 17.72%，MAE 提高了 9.62% 到 21.96%。我们提出的 CFS-Meta 方法兼顾了全局优化和局部搜索技术，大大提高了预测能力。为了评估 CFS-Meta 模型的效率，我们将其与单一特征选择模型和混合特征选择模型进行了比较，结果表明，CFS-Meta 模型的 MSE 降低了 25.61%。此外，与混合 PSO-SA 模型相比，提议的 CFS-Meta 模型实现了 10%（MAE）的改进；与混合 ABC-SA 模型相比，实现了 11.38%（MAE）的改进；与混合 Tabu-GA 模型和混合 ACO-COA 模型相比，分别实现了 12.42% 和 12.703%（MAE）的改进。我们的第三种方法提出了一种集合努力估算（EEE）模型，该模型通过动态权重调整堆叠组合（DWSC）规则合并了多种独立模型。通过对国际基准和行业数据集的测试，HEM 方法将独立模型平均改进了 21.8%（Pred()），将同质集合模型平均改进了 15%（Pred()）。这种全面的方法强调了我们的模型通过先进的预测建模为推进软件项目管理（SPM）所做的贡献，为软件工程工作量估算设定了新的基准。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Journal of King Saud University-Computer and Information Sciences COMPUTER SCIENCE, INFORMATION SYSTEMS-

CiteScore

10.50

自引率

8.70%

发文量

656

审稿时长

29 days

期刊介绍： In 2022 the Journal of King Saud University - Computer and Information Sciences will become an author paid open access journal. Authors who submit their manuscript after October 31st 2021 will be asked to pay an Article Processing Charge (APC) after acceptance of their paper to make their work immediately, permanently, and freely accessible to all. The Journal of King Saud University Computer and Information Sciences is a refereed, international journal that covers all aspects of both foundations of computer and its practical applications.