Software effort prediction models using maximum likelihood methods require multivariate normality of the software metrics data sample: can such a sample be made multivariate normal?
{"title":"Software effort prediction models using maximum likelihood methods require multivariate normality of the software metrics data sample: can such a sample be made multivariate normal?","authors":"Victor K. Y. Chan","doi":"10.1109/CMPSAC.2004.1342843","DOIUrl":null,"url":null,"abstract":"Missing data often appear in software metrics data samples used to construct software effort prediction models. So far, the least biased and thus the most strongly recommended family of such models capable of handling missing data are those using maximum likelihood methods. However, the theory of such maximum likelihood methods assumes that the data samples underlying the model construction are multivariate normal. Previous research on such models simply ignored the violation of such an assumption by the empirical data samples. This paper proposes and empirically illustrates a not-so-complicated but effective technique to transform the data sample for the purpose of meeting such an assumption. This technique is empirically proven to work for typical software metrics data samples and the author recommends applying such a technique in any further research on and practical industrial application of software effort prediction models using maximum likelihood methods","PeriodicalId":355273,"journal":{"name":"Proceedings of the 28th Annual International Computer Software and Applications Conference, 2004. COMPSAC 2004.","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2004-09-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 28th Annual International Computer Software and Applications Conference, 2004. COMPSAC 2004.","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/CMPSAC.2004.1342843","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1
Abstract
Missing data often appear in software metrics data samples used to construct software effort prediction models. So far, the least biased and thus the most strongly recommended family of such models capable of handling missing data are those using maximum likelihood methods. However, the theory of such maximum likelihood methods assumes that the data samples underlying the model construction are multivariate normal. Previous research on such models simply ignored the violation of such an assumption by the empirical data samples. This paper proposes and empirically illustrates a not-so-complicated but effective technique to transform the data sample for the purpose of meeting such an assumption. This technique is empirically proven to work for typical software metrics data samples and the author recommends applying such a technique in any further research on and practical industrial application of software effort prediction models using maximum likelihood methods