{"title":"Mathematical bounds on r2 and the effect size in case-control genome-wide association studies","authors":"Sanjana M. Paye , Michael D. Edge","doi":"10.1016/j.tpb.2025.04.003","DOIUrl":null,"url":null,"abstract":"<div><div>Case-control genome-wide association studies (GWAS) are often used to find associations between genetic variants and diseases. When case-control GWAS are conducted, researchers must make decisions regarding how many cases and how many controls to include in the study. Connections between variants and diseases are made using association statistics, including <span><math><msup><mrow><mi>χ</mi></mrow><mrow><mn>2</mn></mrow></msup></math></span>. Previous work in population genetics has shown that LD statistics, including <span><math><msup><mrow><mi>r</mi></mrow><mrow><mn>2</mn></mrow></msup></math></span>, are bounded by the allele frequencies in the population being studied. Since varying the case fraction changes sample allele frequencies, we use the known bounds on <span><math><msup><mrow><mi>r</mi></mrow><mrow><mn>2</mn></mrow></msup></math></span> to explore how the fraction of cases included in a study can affect statistical power to detect associations. We analyze a simple mathematical model and use simulations to study a quantity proportional to the <span><math><msup><mrow><mi>χ</mi></mrow><mrow><mn>2</mn></mrow></msup></math></span> noncentrality parameter, which is closely related to <span><math><msup><mrow><mi>r</mi></mrow><mrow><mn>2</mn></mrow></msup></math></span>, under various conditions. Varying the case fraction changes the <span><math><msup><mrow><mi>χ</mi></mrow><mrow><mn>2</mn></mrow></msup></math></span> noncentrality parameter, and by extension the statistical power, with effects depending on the dominance, penetrance, and frequency of the risk allele. Our framework explains previously observed results, such as asymmetries in power to detect risk vs. protective alleles, and the fact that a balanced sample of cases and controls does not always give the best power to detect associations, particularly for highly penetrant minor risk alleles that are either dominant or recessive. We show by simulation that our results can be used as a rough guide to statistical power for association tests other than <span><math><msup><mrow><mi>χ</mi></mrow><mrow><mn>2</mn></mrow></msup></math></span> tests of independence.</div></div>","PeriodicalId":49437,"journal":{"name":"Theoretical Population Biology","volume":"164 ","pages":"Pages 1-11"},"PeriodicalIF":1.2000,"publicationDate":"2025-05-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Theoretical Population Biology","FirstCategoryId":"99","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0040580925000280","RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q4","JCRName":"ECOLOGY","Score":null,"Total":0}
引用次数: 0
Abstract
Case-control genome-wide association studies (GWAS) are often used to find associations between genetic variants and diseases. When case-control GWAS are conducted, researchers must make decisions regarding how many cases and how many controls to include in the study. Connections between variants and diseases are made using association statistics, including . Previous work in population genetics has shown that LD statistics, including , are bounded by the allele frequencies in the population being studied. Since varying the case fraction changes sample allele frequencies, we use the known bounds on to explore how the fraction of cases included in a study can affect statistical power to detect associations. We analyze a simple mathematical model and use simulations to study a quantity proportional to the noncentrality parameter, which is closely related to , under various conditions. Varying the case fraction changes the noncentrality parameter, and by extension the statistical power, with effects depending on the dominance, penetrance, and frequency of the risk allele. Our framework explains previously observed results, such as asymmetries in power to detect risk vs. protective alleles, and the fact that a balanced sample of cases and controls does not always give the best power to detect associations, particularly for highly penetrant minor risk alleles that are either dominant or recessive. We show by simulation that our results can be used as a rough guide to statistical power for association tests other than tests of independence.
期刊介绍:
An interdisciplinary journal, Theoretical Population Biology presents articles on theoretical aspects of the biology of populations, particularly in the areas of demography, ecology, epidemiology, evolution, and genetics. Emphasis is on the development of mathematical theory and models that enhance the understanding of biological phenomena.
Articles highlight the motivation and significance of the work for advancing progress in biology, relying on a substantial mathematical effort to obtain biological insight. The journal also presents empirical results and computational and statistical methods directly impinging on theoretical problems in population biology.