{"title":"Stats & Maths & Unicorns","authors":"Raymond A. Anderson","doi":"10.1093/oso/9780192844194.003.0011","DOIUrl":null,"url":null,"abstract":"This chapter covers basic statistical concepts. Most statistics relate to hypothesis testing, and others to variable selection and model fitting. The name is because an exact match between a theoretical and empirical distribution is as rare as a unicorn. (1) Dispersion—measures of random variations—variance and its inflation factor, covariance and correlations {Pearson’s product-moment, Spearman’s rank order}, and the Mahalanobis distance. (2) Goodness-of-fit—do observations match expectations? This applies to both continuous dependent variables {R-squared and adjusted R2} and categorical {Pearson’s chi-square, Hosmer–Lemeshow statistic}. (3) Likelihood—assesses estimates’ goodness-of-fit to binary dependent variables {log-likelihood, deviance}, plus the Akaike and Bayesian information criteria used to penalize complexity. (4) The Holy Trinity of Statistics—i) Neyman–Pearson’s ‘likelihood ratio’—the basis for model comparisons; ii) Wald’s chi-square—for potential variable removal; iii) Rao’s score chi-square—for potential variable inclusion. These are all used in Logistic Regression.","PeriodicalId":286194,"journal":{"name":"Credit Intelligence & Modelling","volume":"21 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-11-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Credit Intelligence & Modelling","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1093/oso/9780192844194.003.0011","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
This chapter covers basic statistical concepts. Most statistics relate to hypothesis testing, and others to variable selection and model fitting. The name is because an exact match between a theoretical and empirical distribution is as rare as a unicorn. (1) Dispersion—measures of random variations—variance and its inflation factor, covariance and correlations {Pearson’s product-moment, Spearman’s rank order}, and the Mahalanobis distance. (2) Goodness-of-fit—do observations match expectations? This applies to both continuous dependent variables {R-squared and adjusted R2} and categorical {Pearson’s chi-square, Hosmer–Lemeshow statistic}. (3) Likelihood—assesses estimates’ goodness-of-fit to binary dependent variables {log-likelihood, deviance}, plus the Akaike and Bayesian information criteria used to penalize complexity. (4) The Holy Trinity of Statistics—i) Neyman–Pearson’s ‘likelihood ratio’—the basis for model comparisons; ii) Wald’s chi-square—for potential variable removal; iii) Rao’s score chi-square—for potential variable inclusion. These are all used in Logistic Regression.