Effort estimation of ETL projects using Forward Stepwise Regression

Raza Rasool, Ali Afzal Malik
{"title":"Effort estimation of ETL projects using Forward Stepwise Regression","authors":"Raza Rasool, Ali Afzal Malik","doi":"10.1109/ICET.2015.7389209","DOIUrl":null,"url":null,"abstract":"Effort estimation is a key component of planning a software development project. In the past, there has been a lot of research on estimation methods for traditional applications but, unfortunately, these methods do not apply to Extract Transform Load (ETL) projects. Coming up with a systematic effort estimate for ETL projects is a challenging task since ETL development does not follow the traditional Software Development Life Cycle (SDLC). Traditional application development is requirements-driven whereas ETL application development is data-driven. This research paper describes the development of an effort estimation model for ETL projects and compares this model with the most widely used algorithmic effort estimation model i.e. COCOMO II. A dataset comprising 220 industrial projects from five different software houses is used to build this effort estimation model using Forward Stepwise Regression. After eliminating 20 outliers from this dataset, the adjusted R2 (i.e. goodness of fit) of our model is 0.87. The prediction and training accuracy of this model is measured using the de-facto standard accuracy measures such as MMRE and PRED(25). On a training dataset of 200 projects, the training accuracy value of PRED(25) is 81.16% and MMRE is 0.16. Results show that our proposed estimation model provides considerably better estimation accuracy as compared to COCOMO II. On a validation dataset of 58 projects, the value of PRED(25) was 49% for our model as compared to 21% for COCOMO II. Furthermore, the MMRE of our model is 0.31 as compared to 0.99 for COCOMO II.","PeriodicalId":166507,"journal":{"name":"2015 International Conference on Emerging Technologies (ICET)","volume":"93 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2015-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"4","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2015 International Conference on Emerging Technologies (ICET)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICET.2015.7389209","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 4

Abstract

Effort estimation is a key component of planning a software development project. In the past, there has been a lot of research on estimation methods for traditional applications but, unfortunately, these methods do not apply to Extract Transform Load (ETL) projects. Coming up with a systematic effort estimate for ETL projects is a challenging task since ETL development does not follow the traditional Software Development Life Cycle (SDLC). Traditional application development is requirements-driven whereas ETL application development is data-driven. This research paper describes the development of an effort estimation model for ETL projects and compares this model with the most widely used algorithmic effort estimation model i.e. COCOMO II. A dataset comprising 220 industrial projects from five different software houses is used to build this effort estimation model using Forward Stepwise Regression. After eliminating 20 outliers from this dataset, the adjusted R2 (i.e. goodness of fit) of our model is 0.87. The prediction and training accuracy of this model is measured using the de-facto standard accuracy measures such as MMRE and PRED(25). On a training dataset of 200 projects, the training accuracy value of PRED(25) is 81.16% and MMRE is 0.16. Results show that our proposed estimation model provides considerably better estimation accuracy as compared to COCOMO II. On a validation dataset of 58 projects, the value of PRED(25) was 49% for our model as compared to 21% for COCOMO II. Furthermore, the MMRE of our model is 0.31 as compared to 0.99 for COCOMO II.
基于正逐步回归的ETL项目工作量估算
工作量评估是规划软件开发项目的关键组成部分。过去,人们对传统应用的估计方法进行了大量的研究,但不幸的是,这些方法并不适用于提取转换负载(ETL)项目。由于ETL开发不遵循传统的软件开发生命周期(SDLC),因此对ETL项目进行系统的工作量评估是一项具有挑战性的任务。传统应用程序开发是需求驱动的,而ETL应用程序开发是数据驱动的。本文描述了ETL项目的工作量估算模型的开发,并将该模型与目前使用最广泛的算法工作量估算模型COCOMO II进行了比较。数据集包括来自五个不同软件公司的220个工业项目,使用前向逐步回归构建了这个工作量估计模型。从该数据集中剔除20个异常值后,我们的模型调整后的R2(即拟合优度)为0.87。该模型的预测和训练精度使用事实上的标准精度度量,如MMRE和PRED(25)来测量。在200个项目的训练数据集上,PRED(25)的训练准确率值为81.16%,MMRE为0.16。结果表明,与COCOMO II相比,我们提出的估计模型提供了明显更好的估计精度。在58个项目的验证数据集中,我们的模型的PRED(25)值为49%,而COCOMO II的PRED值为21%。此外,我们的模型的MMRE为0.31,而COCOMO II的MMRE为0.99。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信