使用最大似然方法的软件工作量预测模型需要软件度量数据样本的多变量正态性:这样的样本可以成为多变量正态性吗?

Victor K. Y. Chan
{"title":"使用最大似然方法的软件工作量预测模型需要软件度量数据样本的多变量正态性:这样的样本可以成为多变量正态性吗?","authors":"Victor K. Y. Chan","doi":"10.1109/CMPSAC.2004.1342843","DOIUrl":null,"url":null,"abstract":"Missing data often appear in software metrics data samples used to construct software effort prediction models. So far, the least biased and thus the most strongly recommended family of such models capable of handling missing data are those using maximum likelihood methods. However, the theory of such maximum likelihood methods assumes that the data samples underlying the model construction are multivariate normal. Previous research on such models simply ignored the violation of such an assumption by the empirical data samples. This paper proposes and empirically illustrates a not-so-complicated but effective technique to transform the data sample for the purpose of meeting such an assumption. This technique is empirically proven to work for typical software metrics data samples and the author recommends applying such a technique in any further research on and practical industrial application of software effort prediction models using maximum likelihood methods","PeriodicalId":355273,"journal":{"name":"Proceedings of the 28th Annual International Computer Software and Applications Conference, 2004. COMPSAC 2004.","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2004-09-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"Software effort prediction models using maximum likelihood methods require multivariate normality of the software metrics data sample: can such a sample be made multivariate normal?\",\"authors\":\"Victor K. Y. Chan\",\"doi\":\"10.1109/CMPSAC.2004.1342843\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Missing data often appear in software metrics data samples used to construct software effort prediction models. So far, the least biased and thus the most strongly recommended family of such models capable of handling missing data are those using maximum likelihood methods. However, the theory of such maximum likelihood methods assumes that the data samples underlying the model construction are multivariate normal. Previous research on such models simply ignored the violation of such an assumption by the empirical data samples. This paper proposes and empirically illustrates a not-so-complicated but effective technique to transform the data sample for the purpose of meeting such an assumption. This technique is empirically proven to work for typical software metrics data samples and the author recommends applying such a technique in any further research on and practical industrial application of software effort prediction models using maximum likelihood methods\",\"PeriodicalId\":355273,\"journal\":{\"name\":\"Proceedings of the 28th Annual International Computer Software and Applications Conference, 2004. COMPSAC 2004.\",\"volume\":\"1 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2004-09-28\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the 28th Annual International Computer Software and Applications Conference, 2004. COMPSAC 2004.\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/CMPSAC.2004.1342843\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 28th Annual International Computer Software and Applications Conference, 2004. COMPSAC 2004.","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/CMPSAC.2004.1342843","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1

摘要

缺失数据经常出现在用于构建软件工作预测模型的软件度量数据样本中。到目前为止,偏差最小,因此最强烈推荐的能够处理缺失数据的此类模型系列是使用最大似然方法的模型。然而,这种极大似然方法的理论假设模型构建的数据样本是多元正态的。以往对这类模型的研究简单地忽略了经验数据样本对这一假设的违背。本文提出并实证说明了一种不太复杂但有效的技术来转换数据样本以满足这样的假设。经验证明该技术适用于典型的软件度量数据样本,作者建议在使用最大似然方法的软件工作量预测模型的任何进一步研究和实际工业应用中应用这种技术
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Software effort prediction models using maximum likelihood methods require multivariate normality of the software metrics data sample: can such a sample be made multivariate normal?
Missing data often appear in software metrics data samples used to construct software effort prediction models. So far, the least biased and thus the most strongly recommended family of such models capable of handling missing data are those using maximum likelihood methods. However, the theory of such maximum likelihood methods assumes that the data samples underlying the model construction are multivariate normal. Previous research on such models simply ignored the violation of such an assumption by the empirical data samples. This paper proposes and empirically illustrates a not-so-complicated but effective technique to transform the data sample for the purpose of meeting such an assumption. This technique is empirically proven to work for typical software metrics data samples and the author recommends applying such a technique in any further research on and practical industrial application of software effort prediction models using maximum likelihood methods
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信