利用大规模测序群体的代入提高罕见变异关联研究的能力。

IF 7.9

Genomics, proteomics & bioinformatics Pub Date : 2025-09-17 DOI:10.1093/gpbjnl/qzaf084

Jinglan Dai, Yixin Zhang, Yuan Gao, Hongru Li, Sha Du, Hao Hong, Dongfang You, Zaiming Li, Ruyang Zhang, Yang Zhao, Zhonghua Liu, David C Christiani, Feng Chen, Sipeng Shen

{"title":"利用大规模测序群体的代入提高罕见变异关联研究的能力。","authors":"Jinglan Dai, Yixin Zhang, Yuan Gao, Hongru Li, Sha Du, Hao Hong, Dongfang You, Zaiming Li, Ruyang Zhang, Yang Zhao, Zhonghua Liu, David C Christiani, Feng Chen, Sipeng Shen","doi":"10.1093/gpbjnl/qzaf084","DOIUrl":null,"url":null,"abstract":"With the emergence of population-scale whole-genome sequencing (WGS), rare variants can be captured precisely. Studying rare variants explains part of the heritability of complex traits that is ignored by conventional genome-wide association studies (GWASs). However, how much the power of using imputed data can approximate or improve that of using WGS in rare variant association studies remains unclear. Using WGS (n = 150,119) as the ground truth, we first evaluated the consistency of rare variants in the single nucleotide polymorphism (SNP) array imputed from TOPMed or HRC+UK10K in the UK Biobank. Imputation quality (average R-square of the TOPMed-imputed data could reach 0.6 for even extremely rare variants with minor allele count ≤ 5. TOPMed-imputed data were closer to WGS for three ethnicities with the average Cramer's V > 0.75. Furthermore, association tests were performed on 45 traits. Under the same sample size, neither of the two imputed data outperformed WGS, but the results of TOPMed-imputed data were more consistent with WGS. When the sample size increased to n = 488,377, the number of identified rare variants in TOPMed-imputed data increased by 27.71% for quantitative traits and approximately 10-fold for binary traits. Finally, we meta-analyzed the association results of SNP array and WGS for lung cancer and epithelial ovarian cancer respectively. Compared to WGS-based results, more variants and genes could be identified. Our findings highlight that incorporating rare variants imputed from large-scale sequencing populations can boost the power of rare variant association tests when WGS has limited sample sizes.","PeriodicalId":94020,"journal":{"name":"Genomics, proteomics & bioinformatics","volume":" ","pages":""},"PeriodicalIF":7.9000,"publicationDate":"2025-09-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Boosting the Power of Rare Variant Association Studies by Imputation Using Large-scale Sequencing Population.\",\"authors\":\"Jinglan Dai, Yixin Zhang, Yuan Gao, Hongru Li, Sha Du, Hao Hong, Dongfang You, Zaiming Li, Ruyang Zhang, Yang Zhao, Zhonghua Liu, David C Christiani, Feng Chen, Sipeng Shen\",\"doi\":\"10.1093/gpbjnl/qzaf084\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"With the emergence of population-scale whole-genome sequencing (WGS), rare variants can be captured precisely. Studying rare variants explains part of the heritability of complex traits that is ignored by conventional genome-wide association studies (GWASs). However, how much the power of using imputed data can approximate or improve that of using WGS in rare variant association studies remains unclear. Using WGS (n = 150,119) as the ground truth, we first evaluated the consistency of rare variants in the single nucleotide polymorphism (SNP) array imputed from TOPMed or HRC+UK10K in the UK Biobank. Imputation quality (average R-square of the TOPMed-imputed data could reach 0.6 for even extremely rare variants with minor allele count ≤ 5. TOPMed-imputed data were closer to WGS for three ethnicities with the average Cramer's V > 0.75. Furthermore, association tests were performed on 45 traits. Under the same sample size, neither of the two imputed data outperformed WGS, but the results of TOPMed-imputed data were more consistent with WGS. When the sample size increased to n = 488,377, the number of identified rare variants in TOPMed-imputed data increased by 27.71% for quantitative traits and approximately 10-fold for binary traits. Finally, we meta-analyzed the association results of SNP array and WGS for lung cancer and epithelial ovarian cancer respectively. Compared to WGS-based results, more variants and genes could be identified. Our findings highlight that incorporating rare variants imputed from large-scale sequencing populations can boost the power of rare variant association tests when WGS has limited sample sizes.\",\"PeriodicalId\":94020,\"journal\":{\"name\":\"Genomics, proteomics & bioinformatics\",\"volume\":\" \",\"pages\":\"\"},\"PeriodicalIF\":7.9000,\"publicationDate\":\"2025-09-17\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Genomics, proteomics & bioinformatics\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1093/gpbjnl/qzaf084\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Genomics, proteomics & bioinformatics","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1093/gpbjnl/qzaf084","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

摘要

随着种群规模全基因组测序（WGS）的出现，可以精确捕获罕见的变异。研究罕见变异解释了传统全基因组关联研究（GWASs）所忽略的部分复杂性状的遗传力。然而，在罕见变异关联研究中，使用输入数据能在多大程度上近似或改进使用WGS的能力仍不清楚。使用WGS （n = 150,119）作为基础事实，我们首先评估了从TOPMed或英国生物银行的HRC+UK10K中输入的单核苷酸多态性（SNP）阵列中罕见变异的一致性。即使是极罕见的小等位基因数≤5的变异，topmed - Imputation数据的平均r平方也能达到0.6。topmed计算的数据在三个种族中更接近WGS，平均克莱默氏V值为0.75。进一步对45个性状进行关联检验。在相同的样本量下，两种输入数据都没有优于WGS，但topmed输入数据的结果与WGS更一致。当样本量增加到n = 488,377时，数量性状的罕见变异数量增加了27.71%，二元性状的罕见变异数量增加了约10倍。最后，我们分别对SNP阵列和WGS与肺癌和上皮性卵巢癌的关联结果进行meta分析。与基于wgs的结果相比，可以识别更多的变异和基因。我们的研究结果强调，在WGS样本量有限的情况下，纳入来自大规模测序人群的罕见变异可以提高罕见变异关联测试的能力。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Boosting the Power of Rare Variant Association Studies by Imputation Using Large-scale Sequencing Population.

With the emergence of population-scale whole-genome sequencing (WGS), rare variants can be captured precisely. Studying rare variants explains part of the heritability of complex traits that is ignored by conventional genome-wide association studies (GWASs). However, how much the power of using imputed data can approximate or improve that of using WGS in rare variant association studies remains unclear. Using WGS (n = 150,119) as the ground truth, we first evaluated the consistency of rare variants in the single nucleotide polymorphism (SNP) array imputed from TOPMed or HRC+UK10K in the UK Biobank. Imputation quality (average R-square of the TOPMed-imputed data could reach 0.6 for even extremely rare variants with minor allele count ≤ 5. TOPMed-imputed data were closer to WGS for three ethnicities with the average Cramer's V > 0.75. Furthermore, association tests were performed on 45 traits. Under the same sample size, neither of the two imputed data outperformed WGS, but the results of TOPMed-imputed data were more consistent with WGS. When the sample size increased to n = 488,377, the number of identified rare variants in TOPMed-imputed data increased by 27.71% for quantitative traits and approximately 10-fold for binary traits. Finally, we meta-analyzed the association results of SNP array and WGS for lung cancer and epithelial ovarian cancer respectively. Compared to WGS-based results, more variants and genes could be identified. Our findings highlight that incorporating rare variants imputed from large-scale sequencing populations can boost the power of rare variant association tests when WGS has limited sample sizes.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Genomics, proteomics & bioinformatics

自引率

0.00%

发文量