Software effort prediction models using maximum likelihood methods require multivariate normality of the software metrics data sample: can such a sample be made multivariate normal?

Proceedings of the 28th Annual International Computer Software and Applications Conference, 2004. COMPSAC 2004. Pub Date : 2004-09-28 DOI:10.1109/CMPSAC.2004.1342843

Victor K. Y. Chan

{"title":"Software effort prediction models using maximum likelihood methods require multivariate normality of the software metrics data sample: can such a sample be made multivariate normal?","authors":"Victor K. Y. Chan","doi":"10.1109/CMPSAC.2004.1342843","DOIUrl":null,"url":null,"abstract":"Missing data often appear in software metrics data samples used to construct software effort prediction models. So far, the least biased and thus the most strongly recommended family of such models capable of handling missing data are those using maximum likelihood methods. However, the theory of such maximum likelihood methods assumes that the data samples underlying the model construction are multivariate normal. Previous research on such models simply ignored the violation of such an assumption by the empirical data samples. This paper proposes and empirically illustrates a not-so-complicated but effective technique to transform the data sample for the purpose of meeting such an assumption. This technique is empirically proven to work for typical software metrics data samples and the author recommends applying such a technique in any further research on and practical industrial application of software effort prediction models using maximum likelihood methods","PeriodicalId":355273,"journal":{"name":"Proceedings of the 28th Annual International Computer Software and Applications Conference, 2004. COMPSAC 2004.","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2004-09-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 28th Annual International Computer Software and Applications Conference, 2004. COMPSAC 2004.","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/CMPSAC.2004.1342843","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 1

Abstract

Missing data often appear in software metrics data samples used to construct software effort prediction models. So far, the least biased and thus the most strongly recommended family of such models capable of handling missing data are those using maximum likelihood methods. However, the theory of such maximum likelihood methods assumes that the data samples underlying the model construction are multivariate normal. Previous research on such models simply ignored the violation of such an assumption by the empirical data samples. This paper proposes and empirically illustrates a not-so-complicated but effective technique to transform the data sample for the purpose of meeting such an assumption. This technique is empirically proven to work for typical software metrics data samples and the author recommends applying such a technique in any further research on and practical industrial application of software effort prediction models using maximum likelihood methods

查看原文本刊更多论文

使用最大似然方法的软件工作量预测模型需要软件度量数据样本的多变量正态性:这样的样本可以成为多变量正态性吗?

缺失数据经常出现在用于构建软件工作预测模型的软件度量数据样本中。到目前为止，偏差最小，因此最强烈推荐的能够处理缺失数据的此类模型系列是使用最大似然方法的模型。然而，这种极大似然方法的理论假设模型构建的数据样本是多元正态的。以往对这类模型的研究简单地忽略了经验数据样本对这一假设的违背。本文提出并实证说明了一种不太复杂但有效的技术来转换数据样本以满足这样的假设。经验证明该技术适用于典型的软件度量数据样本，作者建议在使用最大似然方法的软件工作量预测模型的任何进一步研究和实际工业应用中应用这种技术

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Proceedings of the 28th Annual International Computer Software and Applications Conference, 2004. COMPSAC 2004.

自引率

0.00%

发文量