对极端不平衡病例-对照关联研究的多种表型进行联合分析

IF 3.8 4区医学 Q3 GENETICS & HEREDITY

Genetic Epidemiology Pub Date : 2023-01-24 DOI:10.1002/gepi.22513

Hongjing Xie, Xuewei Cao, Shuanglin Zhang, Qiuying Sha

{"title":"对极端不平衡病例-对照关联研究的多种表型进行联合分析","authors":"Hongjing Xie, Xuewei Cao, Shuanglin Zhang, Qiuying Sha","doi":"10.1002/gepi.22513","DOIUrl":null,"url":null,"abstract":"In genome-wide association studies (GWAS) for thousands of phenotypes in biobanks, most binary phenotypes have substantially fewer cases than controls. Many widely used approaches for joint analysis of multiple phenotypes produce inflated type I error rates for such extremely unbalanced case-control phenotypes. In this research, we develop a method to jointly analyze multiple unbalanced case-control phenotypes to circumvent this issue. We first group multiple phenotypes into different clusters based on a hierarchical clustering method, then we merge phenotypes in each cluster into a single phenotype. In each cluster, we use the saddlepoint approximation to estimate the p value of an association test between the merged phenotype and a single nucleotide polymorphism (SNP) which eliminates the issue of inflated type I error rate of the test for extremely unbalanced case-control phenotypes. Finally, we use the Cauchy combination method to obtain an integrated p value for all clusters to test the association between multiple phenotypes and a SNP. We use extensive simulation studies to evaluate the performance of the proposed approach. The results show that the proposed approach can control type I error rate very well and is more powerful than other available methods. We also apply the proposed approach to phenotypes in category IX (diseases of the circulatory system) in the UK Biobank. We find that the proposed approach can identify more significant SNPs than the other viable methods we compared with.","PeriodicalId":12710,"journal":{"name":"Genetic Epidemiology","volume":"47 2","pages":"185-197"},"PeriodicalIF":3.8000,"publicationDate":"2023-01-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"Joint analysis of multiple phenotypes for extremely unbalanced case-control association studies\",\"authors\":\"Hongjing Xie, Xuewei Cao, Shuanglin Zhang, Qiuying Sha\",\"doi\":\"10.1002/gepi.22513\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"In genome-wide association studies (GWAS) for thousands of phenotypes in biobanks, most binary phenotypes have substantially fewer cases than controls. Many widely used approaches for joint analysis of multiple phenotypes produce inflated type I error rates for such extremely unbalanced case-control phenotypes. In this research, we develop a method to jointly analyze multiple unbalanced case-control phenotypes to circumvent this issue. We first group multiple phenotypes into different clusters based on a hierarchical clustering method, then we merge phenotypes in each cluster into a single phenotype. In each cluster, we use the saddlepoint approximation to estimate the p value of an association test between the merged phenotype and a single nucleotide polymorphism (SNP) which eliminates the issue of inflated type I error rate of the test for extremely unbalanced case-control phenotypes. Finally, we use the Cauchy combination method to obtain an integrated p value for all clusters to test the association between multiple phenotypes and a SNP. We use extensive simulation studies to evaluate the performance of the proposed approach. The results show that the proposed approach can control type I error rate very well and is more powerful than other available methods. We also apply the proposed approach to phenotypes in category IX (diseases of the circulatory system) in the UK Biobank. We find that the proposed approach can identify more significant SNPs than the other viable methods we compared with.\",\"PeriodicalId\":12710,\"journal\":{\"name\":\"Genetic Epidemiology\",\"volume\":\"47 2\",\"pages\":\"185-197\"},\"PeriodicalIF\":3.8000,\"publicationDate\":\"2023-01-24\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Genetic Epidemiology\",\"FirstCategoryId\":\"3\",\"ListUrlMain\":\"https://onlinelibrary.wiley.com/doi/10.1002/gepi.22513\",\"RegionNum\":4,\"RegionCategory\":\"医学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q3\",\"JCRName\":\"GENETICS & HEREDITY\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Genetic Epidemiology","FirstCategoryId":"3","ListUrlMain":"https://onlinelibrary.wiley.com/doi/10.1002/gepi.22513","RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"GENETICS & HEREDITY","Score":null,"Total":0}

引用次数: 1

摘要

在生物库中数千种表型的全基因组关联研究(GWAS)中，大多数二元表型的病例比对照组少得多。许多广泛使用的多种表型联合分析方法对这种极不平衡的病例对照表型产生了膨胀的I型错误率。在本研究中，我们开发了一种方法来联合分析多个不平衡的病例-对照表型来规避这一问题。我们首先基于分层聚类方法将多个表型分成不同的簇，然后将每个簇中的表型合并为单个表型。在每个聚类中，我们使用鞍点近似来估计合并表型和单核苷酸多态性(SNP)之间的关联检验的p值，这消除了极端不平衡病例对照表型检验的I型错误率过高的问题。最后，我们使用柯西组合方法获得所有群集的综合p值，以测试多个表型与SNP之间的关联。我们使用广泛的仿真研究来评估所提出方法的性能。结果表明，该方法可以很好地控制I类错误率，比现有的方法更强大。我们还将提出的方法应用于英国生物银行第九类(循环系统疾病)的表型。我们发现，与我们比较的其他可行方法相比，所提出的方法可以识别出更重要的snp。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Joint analysis of multiple phenotypes for extremely unbalanced case-control association studies

In genome-wide association studies (GWAS) for thousands of phenotypes in biobanks, most binary phenotypes have substantially fewer cases than controls. Many widely used approaches for joint analysis of multiple phenotypes produce inflated type I error rates for such extremely unbalanced case-control phenotypes. In this research, we develop a method to jointly analyze multiple unbalanced case-control phenotypes to circumvent this issue. We first group multiple phenotypes into different clusters based on a hierarchical clustering method, then we merge phenotypes in each cluster into a single phenotype. In each cluster, we use the saddlepoint approximation to estimate the p value of an association test between the merged phenotype and a single nucleotide polymorphism (SNP) which eliminates the issue of inflated type I error rate of the test for extremely unbalanced case-control phenotypes. Finally, we use the Cauchy combination method to obtain an integrated p value for all clusters to test the association between multiple phenotypes and a SNP. We use extensive simulation studies to evaluate the performance of the proposed approach. The results show that the proposed approach can control type I error rate very well and is more powerful than other available methods. We also apply the proposed approach to phenotypes in category IX (diseases of the circulatory system) in the UK Biobank. We find that the proposed approach can identify more significant SNPs than the other viable methods we compared with.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Genetic Epidemiology 医学-公共卫生、环境卫生与职业卫生

CiteScore

4.40

自引率

9.50%

发文量

审稿时长

6-12 weeks

期刊介绍： Genetic Epidemiology is a peer-reviewed journal for discussion of research on the genetic causes of the distribution of human traits in families and populations. Emphasis is placed on the relative contribution of genetic and environmental factors to human disease as revealed by genetic, epidemiological, and biologic investigations. Genetic Epidemiology primarily publishes papers in statistical genetics, a research field that is primarily concerned with development of statistical, bioinformatical, and computational models for analyzing genetic data. Incorporation of underlying biology and population genetics into conceptual models is favored. The Journal seeks original articles comprising either applied research or innovative statistical, mathematical, computational, or genomic methodologies that advance studies in genetic epidemiology. Other types of reports are encouraged, such as letters to the editor, topic reviews, and perspectives from other fields of research that will likely enrich the field of genetic epidemiology.