How to Make Best Use of Cross-Company Data for Web Effort Estimation?

2015 ACM/IEEE International Symposium on Empirical Software Engineering and Measurement (ESEM) Pub Date : 2015-10-01 DOI:10.1109/ESEM.2015.7321199

Leandro L. Minku, Federica Sarro, E. Mendes, F. Ferrucci

{"title":"How to Make Best Use of Cross-Company Data for Web Effort Estimation?","authors":"Leandro L. Minku, Federica Sarro, E. Mendes, F. Ferrucci","doi":"10.1109/ESEM.2015.7321199","DOIUrl":null,"url":null,"abstract":"[Context]: The numerous challenges that can hinder software companies from gathering their own data have motivated over the past 15 years research on the use of cross-company (CC) datasets for software effort prediction. Part of this research focused on Web effort prediction, given the large increase worldwide in the development of Web applications. Some of these studies indicate that it may be possible to achieve better performance using CC models if some strategy to make the CC data more similar to the within-company (WC) data is adopted. [Goal]: This study investigates the use of a recently proposed approach called Dycom to assess to what extent Web effort predictions obtained using CC datasets are effective in relation to the predictions obtained using WC data when explicitly mapping the CC models to the WC context. [Method]: Data on 125 Web projects from eight different companies part of the Tukutuku database were used to build prediction models. We benchmarked these models against baseline models (mean and median effort) and a WC base learner that does not benefit of the mapping. We also compared Dycom against a competitive CC approach from the literature (NN-filtering). We report a company-by- company analysis. [Results]: Dycom usually managed to achieve similar or better performance than a WC model while using only half of the WC training data. These results are also an improvement over previous studies that investigated the use of different strategies to adapt CC models to the WC data for Web effort estimation. [Conclusions]: We conclude that the use of Dycom for Web effort prediction is quite promising and in general supports previous results when applying Dycom to conventional software datasets.","PeriodicalId":258843,"journal":{"name":"2015 ACM/IEEE International Symposium on Empirical Software Engineering and Measurement (ESEM)","volume":"139 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2015-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"34","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2015 ACM/IEEE International Symposium on Empirical Software Engineering and Measurement (ESEM)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ESEM.2015.7321199","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 34

Abstract

[Context]: The numerous challenges that can hinder software companies from gathering their own data have motivated over the past 15 years research on the use of cross-company (CC) datasets for software effort prediction. Part of this research focused on Web effort prediction, given the large increase worldwide in the development of Web applications. Some of these studies indicate that it may be possible to achieve better performance using CC models if some strategy to make the CC data more similar to the within-company (WC) data is adopted. [Goal]: This study investigates the use of a recently proposed approach called Dycom to assess to what extent Web effort predictions obtained using CC datasets are effective in relation to the predictions obtained using WC data when explicitly mapping the CC models to the WC context. [Method]: Data on 125 Web projects from eight different companies part of the Tukutuku database were used to build prediction models. We benchmarked these models against baseline models (mean and median effort) and a WC base learner that does not benefit of the mapping. We also compared Dycom against a competitive CC approach from the literature (NN-filtering). We report a company-by- company analysis. [Results]: Dycom usually managed to achieve similar or better performance than a WC model while using only half of the WC training data. These results are also an improvement over previous studies that investigated the use of different strategies to adapt CC models to the WC data for Web effort estimation. [Conclusions]: We conclude that the use of Dycom for Web effort prediction is quite promising and in general supports previous results when applying Dycom to conventional software datasets.

查看原文本刊更多论文

如何充分利用跨公司数据进行网络工作量估算?

【背景】:在过去的15年里，阻碍软件公司收集自己数据的众多挑战激发了对使用跨公司(CC)数据集进行软件工作量预测的研究。考虑到Web应用程序开发在全球范围内的大幅增长，这项研究的一部分集中在Web工作量预测上。其中一些研究表明，如果采用某种策略使CC数据与公司内部(WC)数据更相似，则可能使用CC模型获得更好的性能。【目的】:本研究调查了最近提出的一种名为Dycom的方法的使用情况，以评估在将CC模型显式映射到WC上下文时，使用CC数据集获得的Web工作量预测与使用WC数据获得的预测相比在多大程度上是有效的。【方法】:利用来自8家不同公司的125个Web项目的数据(Tukutuku数据库的一部分)构建预测模型。我们将这些模型与基线模型(平均和中位数努力)以及没有从映射中受益的WC基础学习器进行基准测试。我们还比较了Dycom与文献中的竞争性CC方法(神经网络滤波)。我们报告的是对每家公司的分析。[结果]:Dycom通常只使用一半的WC训练数据，就能达到与WC模型相似或更好的性能。这些结果也是对以前的研究的改进，这些研究调查了使用不同的策略使CC模型适应用于Web工作量估计的WC数据。[结论]:我们得出的结论是，使用Dycom进行Web工作量预测是非常有前途的，并且在将Dycom应用于传统软件数据集时，通常支持以前的结果。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2015 ACM/IEEE International Symposium on Empirical Software Engineering and Measurement (ESEM)

自引率

0.00%

发文量