Sally Hunsberger, Lori Long, Sarah E Reese, Gloria H Hong, Ian A Myles, Christa S Zerbe, Pleonchan Chetchotisakd, Joanna H Shih
{"title":"Rank correlation inferences for clustered data with small sample size.","authors":"Sally Hunsberger, Lori Long, Sarah E Reese, Gloria H Hong, Ian A Myles, Christa S Zerbe, Pleonchan Chetchotisakd, Joanna H Shih","doi":"10.1111/stan.12261","DOIUrl":null,"url":null,"abstract":"<p><p>This paper develops methods to test for associations between two variables with clustered data using a <i>U</i>-Statistic approach with a second-order approximation to the variance of the parameter estimate for the test statistic. The tests that are presented are for clustered versions of: Pearsons <i>χ</i> <sup>2</sup> test, the Spearman rank correlation and Kendall's <i>τ</i> for continuous data or ordinal data and for alternative measures of Kendall's <i>τ</i> that allow for ties in the data. Shih and Fay use the <i>U</i>-Statistic approach but only consider a first-order approximation. The first-order approximation has inflated significance level in scenarios with small sample sizes. We derive the test statistics using the second-order approximations aiming to improve the type I error rates. The method applies to data where clusters have the same number of measurements for each variable or where one of the variables may be measured once per cluster while the other variable may be measured multiple times. We evaluate the performance of the test statistics through simulation with small sample sizes. The methods are all available in the R package cluscor.</p>","PeriodicalId":51178,"journal":{"name":"Statistica Neerlandica","volume":"76 3","pages":"309-330"},"PeriodicalIF":1.4000,"publicationDate":"2022-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9355045/pdf/nihms-1774814.pdf","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Statistica Neerlandica","FirstCategoryId":"100","ListUrlMain":"https://doi.org/10.1111/stan.12261","RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2022/1/12 0:00:00","PubModel":"Epub","JCR":"Q2","JCRName":"STATISTICS & PROBABILITY","Score":null,"Total":0}
引用次数: 0
Abstract
This paper develops methods to test for associations between two variables with clustered data using a U-Statistic approach with a second-order approximation to the variance of the parameter estimate for the test statistic. The tests that are presented are for clustered versions of: Pearsons χ2 test, the Spearman rank correlation and Kendall's τ for continuous data or ordinal data and for alternative measures of Kendall's τ that allow for ties in the data. Shih and Fay use the U-Statistic approach but only consider a first-order approximation. The first-order approximation has inflated significance level in scenarios with small sample sizes. We derive the test statistics using the second-order approximations aiming to improve the type I error rates. The method applies to data where clusters have the same number of measurements for each variable or where one of the variables may be measured once per cluster while the other variable may be measured multiple times. We evaluate the performance of the test statistics through simulation with small sample sizes. The methods are all available in the R package cluscor.
期刊介绍:
Statistica Neerlandica has been the journal of the Netherlands Society for Statistics and Operations Research since 1946. It covers all areas of statistics, from theoretical to applied, with a special emphasis on mathematical statistics, statistics for the behavioural sciences and biostatistics. This wide scope is reflected by the expertise of the journal’s editors representing these areas. The diverse editorial board is committed to a fast and fair reviewing process, and will judge submissions on quality, correctness, relevance and originality. Statistica Neerlandica encourages transparency and reproducibility, and offers online resources to make data, code, simulation results and other additional materials publicly available.