{"title":"Nonsensical and biased correlation due to pooling heterogeneous samples","authors":"Uwe Hassler, Thorsten Thadewald","doi":"10.1111/1467-9884.00365","DOIUrl":null,"url":null,"abstract":"<p><b>Summary.</b> The case of two variables is considered, where the sample consists of two heterogeneous groups. The behaviour of the pooled sample correlation coefficient is studied. The heterogeneity of the two groups may be interpreted as a hidden qualitative variable. It is shown that, even if the correlation is the same within both groups, the pooled correlation coefficient may be severely biased owing to heterogeneity of other group-specific parameters. In the case of uncorrelatedness, nonsensical correlation may arise from pooled estimation. These and further results are obtained and can be quantified or forecast from an asymptotic formula for the pooled sample correlation coefficient, which is well reproduced in finite sample computer experiments and illustrated with empirical examples.</p>","PeriodicalId":100846,"journal":{"name":"Journal of the Royal Statistical Society: Series D (The Statistician)","volume":"52 3","pages":"367-379"},"PeriodicalIF":0.0000,"publicationDate":"2003-08-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1111/1467-9884.00365","citationCount":"42","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of the Royal Statistical Society: Series D (The Statistician)","FirstCategoryId":"1085","ListUrlMain":"https://onlinelibrary.wiley.com/doi/10.1111/1467-9884.00365","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 42
Abstract
Summary. The case of two variables is considered, where the sample consists of two heterogeneous groups. The behaviour of the pooled sample correlation coefficient is studied. The heterogeneity of the two groups may be interpreted as a hidden qualitative variable. It is shown that, even if the correlation is the same within both groups, the pooled correlation coefficient may be severely biased owing to heterogeneity of other group-specific parameters. In the case of uncorrelatedness, nonsensical correlation may arise from pooled estimation. These and further results are obtained and can be quantified or forecast from an asymptotic formula for the pooled sample correlation coefficient, which is well reproduced in finite sample computer experiments and illustrated with empirical examples.