在五个生物库中对多基因评分方法进行的评估显示，生物库之间的差异大于方法之间的差异，并发现了集合学习的优势。

IF 8.1 1区生物学 Q1 GENETICS & HEREDITY

American journal of human genetics Pub Date : 2024-07-11 Epub Date: 2024-06-21 DOI:10.1016/j.ajhg.2024.06.003

Remo Monti, Lisa Eick, Georgi Hudjashov, Kristi Läll, Stavroula Kanoni, Brooke N Wolford, Benjamin Wingfield, Oliver Pain, Sophie Wharrie, Bradley Jermy, Aoife McMahon, Tuomo Hartonen, Henrike Heyne, Nina Mars, Samuel Lambert, Kristian Hveem, Michael Inouye, David A van Heel, Reedik Mägi, Pekka Marttinen, Samuli Ripatti, Andrea Ganna, Christoph Lippert

{"title":"在五个生物库中对多基因评分方法进行的评估显示，生物库之间的差异大于方法之间的差异，并发现了集合学习的优势。","authors":"Remo Monti, Lisa Eick, Georgi Hudjashov, Kristi Läll, Stavroula Kanoni, Brooke N Wolford, Benjamin Wingfield, Oliver Pain, Sophie Wharrie, Bradley Jermy, Aoife McMahon, Tuomo Hartonen, Henrike Heyne, Nina Mars, Samuel Lambert, Kristian Hveem, Michael Inouye, David A van Heel, Reedik Mägi, Pekka Marttinen, Samuli Ripatti, Andrea Ganna, Christoph Lippert","doi":"10.1016/j.ajhg.2024.06.003","DOIUrl":null,"url":null,"abstract":"Methods of estimating polygenic scores (PGSs) from genome-wide association studies are increasingly utilized. However, independent method evaluation is lacking, and method comparisons are often limited. Here, we evaluate polygenic scores derived via seven methods in five biobank studies (totaling about 1.2 million participants) across 16 diseases and quantitative traits, building on a reference-standardized framework. We conducted meta-analyses to quantify the effects of method choice, hyperparameter tuning, method ensembling, and the target biobank on PGS performance. We found that no single method consistently outperformed all others. PGS effect sizes were more variable between biobanks than between methods within biobanks when methods were well tuned. Differences between methods were largest for the two investigated autoimmune diseases, seropositive rheumatoid arthritis and type 1 diabetes. For most methods, cross-validation was more reliable for tuning hyperparameters than automatic tuning (without the use of target data). For a given target phenotype, elastic net models combining PGS across methods (ensemble PGS) tuned in the UK Biobank provided consistent, high, and cross-biobank transferable performance, increasing PGS effect sizes (β coefficients) by a median of 5.0% relative to LDpred2 and MegaPRS (the two best-performing single methods when tuned with cross-validation). Our interactively browsable online-results and open-source workflow prspipe provide a rich resource and reference for the analysis of polygenic scoring methods across biobanks.","PeriodicalId":7659,"journal":{"name":"American journal of human genetics","volume":" ","pages":"1431-1447"},"PeriodicalIF":8.1000,"publicationDate":"2024-07-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11267524/pdf/","citationCount":"0","resultStr":"{\"title\":\"Evaluation of polygenic scoring methods in five biobanks shows larger variation between biobanks than methods and finds benefits of ensemble learning.\",\"authors\":\"Remo Monti, Lisa Eick, Georgi Hudjashov, Kristi Läll, Stavroula Kanoni, Brooke N Wolford, Benjamin Wingfield, Oliver Pain, Sophie Wharrie, Bradley Jermy, Aoife McMahon, Tuomo Hartonen, Henrike Heyne, Nina Mars, Samuel Lambert, Kristian Hveem, Michael Inouye, David A van Heel, Reedik Mägi, Pekka Marttinen, Samuli Ripatti, Andrea Ganna, Christoph Lippert\",\"doi\":\"10.1016/j.ajhg.2024.06.003\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Methods of estimating polygenic scores (PGSs) from genome-wide association studies are increasingly utilized. However, independent method evaluation is lacking, and method comparisons are often limited. Here, we evaluate polygenic scores derived via seven methods in five biobank studies (totaling about 1.2 million participants) across 16 diseases and quantitative traits, building on a reference-standardized framework. We conducted meta-analyses to quantify the effects of method choice, hyperparameter tuning, method ensembling, and the target biobank on PGS performance. We found that no single method consistently outperformed all others. PGS effect sizes were more variable between biobanks than between methods within biobanks when methods were well tuned. Differences between methods were largest for the two investigated autoimmune diseases, seropositive rheumatoid arthritis and type 1 diabetes. For most methods, cross-validation was more reliable for tuning hyperparameters than automatic tuning (without the use of target data). For a given target phenotype, elastic net models combining PGS across methods (ensemble PGS) tuned in the UK Biobank provided consistent, high, and cross-biobank transferable performance, increasing PGS effect sizes (β coefficients) by a median of 5.0% relative to LDpred2 and MegaPRS (the two best-performing single methods when tuned with cross-validation). Our interactively browsable online-results and open-source workflow prspipe provide a rich resource and reference for the analysis of polygenic scoring methods across biobanks.\",\"PeriodicalId\":7659,\"journal\":{\"name\":\"American journal of human genetics\",\"volume\":\" \",\"pages\":\"1431-1447\"},\"PeriodicalIF\":8.1000,\"publicationDate\":\"2024-07-11\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11267524/pdf/\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"American journal of human genetics\",\"FirstCategoryId\":\"99\",\"ListUrlMain\":\"https://doi.org/10.1016/j.ajhg.2024.06.003\",\"RegionNum\":1,\"RegionCategory\":\"生物学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"2024/6/21 0:00:00\",\"PubModel\":\"Epub\",\"JCR\":\"Q1\",\"JCRName\":\"GENETICS & HEREDITY\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"American journal of human genetics","FirstCategoryId":"99","ListUrlMain":"https://doi.org/10.1016/j.ajhg.2024.06.003","RegionNum":1,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2024/6/21 0:00:00","PubModel":"Epub","JCR":"Q1","JCRName":"GENETICS & HEREDITY","Score":null,"Total":0}

引用次数: 0

摘要

从全基因组关联研究中估算多基因分数（PGS）的方法越来越多地被使用。然而，目前还缺乏独立的方法评估，而且方法比较通常也很有限。在此，我们以参考标准化框架为基础，评估了五项生物库研究（共约 120 万参与者）中通过七种方法得出的多基因分数，涉及 16 种疾病和定量性状。我们进行了元分析，以量化方法选择、超参数调整、方法组合和目标生物库对 PGS 性能的影响。我们发现，没有一种方法能始终优于所有其他方法。在方法调整良好的情况下，不同生物库之间的 PGS 效应大小比生物库内不同方法之间的差异更大。在调查的两种自身免疫性疾病（血清阳性类风湿性关节炎和 1 型糖尿病）中，不同方法之间的差异最大。对于大多数方法来说，交叉验证比自动调整（不使用目标数据）更能可靠地调整超参数。对于给定的目标表型，在英国生物库中调整的结合了不同方法的PGS（集合PGS）的弹性网模型提供了一致、高和可跨生物库转移的性能，相对于LDpred2和MegaPRS（经交叉验证调整后性能最好的两种单一方法），PGS效应大小（β系数）的中位数增加了5.0%。我们可交互式浏览的在线结果和开源工作流程 prspipe 为跨生物库的多基因评分方法分析提供了丰富的资源和参考。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Evaluation of polygenic scoring methods in five biobanks shows larger variation between biobanks than methods and finds benefits of ensemble learning.

Methods of estimating polygenic scores (PGSs) from genome-wide association studies are increasingly utilized. However, independent method evaluation is lacking, and method comparisons are often limited. Here, we evaluate polygenic scores derived via seven methods in five biobank studies (totaling about 1.2 million participants) across 16 diseases and quantitative traits, building on a reference-standardized framework. We conducted meta-analyses to quantify the effects of method choice, hyperparameter tuning, method ensembling, and the target biobank on PGS performance. We found that no single method consistently outperformed all others. PGS effect sizes were more variable between biobanks than between methods within biobanks when methods were well tuned. Differences between methods were largest for the two investigated autoimmune diseases, seropositive rheumatoid arthritis and type 1 diabetes. For most methods, cross-validation was more reliable for tuning hyperparameters than automatic tuning (without the use of target data). For a given target phenotype, elastic net models combining PGS across methods (ensemble PGS) tuned in the UK Biobank provided consistent, high, and cross-biobank transferable performance, increasing PGS effect sizes (β coefficients) by a median of 5.0% relative to LDpred2 and MegaPRS (the two best-performing single methods when tuned with cross-validation). Our interactively browsable online-results and open-source workflow prspipe provide a rich resource and reference for the analysis of polygenic scoring methods across biobanks.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

American journal of human genetics 生物-遗传学

CiteScore

14.70

自引率

4.10%

发文量

185

审稿时长

1 months

期刊介绍： The American Journal of Human Genetics (AJHG) is a monthly journal published by Cell Press, chosen by The American Society of Human Genetics (ASHG) as its premier publication starting from January 2008. AJHG represents Cell Press's first society-owned journal, and both ASHG and Cell Press anticipate significant synergies between AJHG content and that of other Cell Press titles.