Improving polygenic prediction from summary data by learning patterns of effect sharing across multiple phenotypes.

IF 4 2区生物学 Q1 GENETICS & HEREDITY

PLoS Genetics Pub Date : 2025-01-07 eCollection Date: 2025-01-01 DOI:10.1371/journal.pgen.1011519

Deborah Kunkel, Peter Sørensen, Vijay Shankar, Fabio Morgante

{"title":"Improving polygenic prediction from summary data by learning patterns of effect sharing across multiple phenotypes.","authors":"Deborah Kunkel, Peter Sørensen, Vijay Shankar, Fabio Morgante","doi":"10.1371/journal.pgen.1011519","DOIUrl":null,"url":null,"abstract":"<p><p>Polygenic prediction of complex trait phenotypes has become important in human genetics, especially in the context of precision medicine. Recently, mr.mash, a flexible and computationally efficient method that models multiple phenotypes jointly and leverages sharing of effects across such phenotypes to improve prediction accuracy, was introduced. However, a drawback of mr.mash is that it requires individual-level data, which are often not publicly available. In this work, we introduce mr.mash-rss, an extension of the mr.mash model that requires only summary statistics from Genome-Wide Association Studies (GWAS) and linkage disequilibrium (LD) estimates from a reference panel. By using summary data, we achieve the twin goal of increasing the applicability of the mr.mash model to data sets that are not publicly available and making it scalable to biobank-size data. Through simulations, we show that mr.mash-rss is competitive with, and often outperforms, current state-of-the-art methods for single- and multi-phenotype polygenic prediction in a variety of scenarios that differ in the pattern of effect sharing across phenotypes, the number of phenotypes, the number of causal variants, and the genomic heritability. We also present a real data analysis of 16 blood cell phenotypes in the UK Biobank, showing that mr.mash-rss achieves higher prediction accuracy than competing methods for the majority of traits, especially when the data set has smaller sample size.</p>","PeriodicalId":49007,"journal":{"name":"PLoS Genetics","volume":"21 1","pages":"e1011519"},"PeriodicalIF":4.0000,"publicationDate":"2025-01-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11741642/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"PLoS Genetics","FirstCategoryId":"99","ListUrlMain":"https://doi.org/10.1371/journal.pgen.1011519","RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2025/1/1 0:00:00","PubModel":"eCollection","JCR":"Q1","JCRName":"GENETICS & HEREDITY","Score":null,"Total":0}

引用次数: 0

Abstract

Polygenic prediction of complex trait phenotypes has become important in human genetics, especially in the context of precision medicine. Recently, mr.mash, a flexible and computationally efficient method that models multiple phenotypes jointly and leverages sharing of effects across such phenotypes to improve prediction accuracy, was introduced. However, a drawback of mr.mash is that it requires individual-level data, which are often not publicly available. In this work, we introduce mr.mash-rss, an extension of the mr.mash model that requires only summary statistics from Genome-Wide Association Studies (GWAS) and linkage disequilibrium (LD) estimates from a reference panel. By using summary data, we achieve the twin goal of increasing the applicability of the mr.mash model to data sets that are not publicly available and making it scalable to biobank-size data. Through simulations, we show that mr.mash-rss is competitive with, and often outperforms, current state-of-the-art methods for single- and multi-phenotype polygenic prediction in a variety of scenarios that differ in the pattern of effect sharing across phenotypes, the number of phenotypes, the number of causal variants, and the genomic heritability. We also present a real data analysis of 16 blood cell phenotypes in the UK Biobank, showing that mr.mash-rss achieves higher prediction accuracy than competing methods for the majority of traits, especially when the data set has smaller sample size.

查看原文本刊更多论文

通过学习跨多种表型的效应共享模式，从汇总数据改进多基因预测。

复杂性状表型的多基因预测在人类遗传学中变得非常重要，特别是在精准医学的背景下。最近，mr.mash提出了一种灵活且计算效率高的方法，该方法可以联合对多个表型进行建模，并利用这些表型之间的效应共享来提高预测精度。然而，mr.mash的一个缺点是它需要个人层面的数据，而这些数据通常是不公开的。在这项工作中，我们引入mr.mash-rss，这是mr.mash模型的扩展，它只需要来自全基因组关联研究（GWAS）的汇总统计数据和来自参考小组的连锁不平衡（LD）估计。通过使用汇总数据，我们实现了双重目标，即提高mr.mash模型对非公开数据集的适用性，并使其可扩展到生物库大小的数据。通过模拟，我们发现mr.mash-rss在不同的情况下与当前最先进的单表型和多表型多基因预测方法竞争，并且通常优于当前最先进的方法，这些方法在不同的表型之间的效应共享模式、表型数量、因果变异数量和基因组遗传性方面存在差异。我们还展示了UK Biobank中16种血细胞表型的真实数据分析，表明mr.mash-rss在大多数性状方面的预测精度高于竞争方法，特别是在数据集样本量较小的情况下。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

PLoS Genetics GENETICS & HEREDITY-

自引率

2.20%

发文量

438

期刊介绍： PLOS Genetics is run by an international Editorial Board, headed by the Editors-in-Chief, Greg Barsh (HudsonAlpha Institute of Biotechnology, and Stanford University School of Medicine) and Greg Copenhaver (The University of North Carolina at Chapel Hill). Articles published in PLOS Genetics are archived in PubMed Central and cited in PubMed.