{"title":"使用最大似然方法的软件工作量预测模型需要软件度量数据样本的多变量正态性:这样的样本可以成为多变量正态性吗?","authors":"Victor K. Y. Chan","doi":"10.1109/CMPSAC.2004.1342843","DOIUrl":null,"url":null,"abstract":"Missing data often appear in software metrics data samples used to construct software effort prediction models. So far, the least biased and thus the most strongly recommended family of such models capable of handling missing data are those using maximum likelihood methods. However, the theory of such maximum likelihood methods assumes that the data samples underlying the model construction are multivariate normal. Previous research on such models simply ignored the violation of such an assumption by the empirical data samples. This paper proposes and empirically illustrates a not-so-complicated but effective technique to transform the data sample for the purpose of meeting such an assumption. This technique is empirically proven to work for typical software metrics data samples and the author recommends applying such a technique in any further research on and practical industrial application of software effort prediction models using maximum likelihood methods","PeriodicalId":355273,"journal":{"name":"Proceedings of the 28th Annual International Computer Software and Applications Conference, 2004. COMPSAC 2004.","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2004-09-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"Software effort prediction models using maximum likelihood methods require multivariate normality of the software metrics data sample: can such a sample be made multivariate normal?\",\"authors\":\"Victor K. Y. Chan\",\"doi\":\"10.1109/CMPSAC.2004.1342843\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Missing data often appear in software metrics data samples used to construct software effort prediction models. So far, the least biased and thus the most strongly recommended family of such models capable of handling missing data are those using maximum likelihood methods. However, the theory of such maximum likelihood methods assumes that the data samples underlying the model construction are multivariate normal. Previous research on such models simply ignored the violation of such an assumption by the empirical data samples. This paper proposes and empirically illustrates a not-so-complicated but effective technique to transform the data sample for the purpose of meeting such an assumption. This technique is empirically proven to work for typical software metrics data samples and the author recommends applying such a technique in any further research on and practical industrial application of software effort prediction models using maximum likelihood methods\",\"PeriodicalId\":355273,\"journal\":{\"name\":\"Proceedings of the 28th Annual International Computer Software and Applications Conference, 2004. COMPSAC 2004.\",\"volume\":\"1 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2004-09-28\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the 28th Annual International Computer Software and Applications Conference, 2004. COMPSAC 2004.\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/CMPSAC.2004.1342843\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 28th Annual International Computer Software and Applications Conference, 2004. COMPSAC 2004.","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/CMPSAC.2004.1342843","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Software effort prediction models using maximum likelihood methods require multivariate normality of the software metrics data sample: can such a sample be made multivariate normal?
Missing data often appear in software metrics data samples used to construct software effort prediction models. So far, the least biased and thus the most strongly recommended family of such models capable of handling missing data are those using maximum likelihood methods. However, the theory of such maximum likelihood methods assumes that the data samples underlying the model construction are multivariate normal. Previous research on such models simply ignored the violation of such an assumption by the empirical data samples. This paper proposes and empirically illustrates a not-so-complicated but effective technique to transform the data sample for the purpose of meeting such an assumption. This technique is empirically proven to work for typical software metrics data samples and the author recommends applying such a technique in any further research on and practical industrial application of software effort prediction models using maximum likelihood methods