Alexa S. Lupi, Ana I. Vazquez, Gustavo de los Campos
{"title":"Mapping the relative accuracy of cross-ancestry prediction","authors":"Alexa S. Lupi, Ana I. Vazquez, Gustavo de los Campos","doi":"10.1038/s41467-024-54727-8","DOIUrl":null,"url":null,"abstract":"<p>The overwhelming majority of participants in genome-wide association studies (GWAS) have European (EUR) ancestry, and polygenic scores (PGS) derived from EURs often perform poorly in non-EURs. Previous studies suggest that between-ancestry differences in allele frequencies and linkage disequilibrium are significant contributors to the poor portability of PGS in cross-ancestry prediction. We hypothesize that the portability of (local) PGS varies significantly over the genome. Therefore, we develop a method, MC-ANOVA, to estimate the loss of accuracy in cross-ancestry prediction attributable to allele frequency and linkage disequilibrium differences between ancestries. Using data from the UK Biobank we develop PGS relative accuracy (RA) maps quantifying the local portability of EUR-derived PGS in non-EUR ancestries. We report substantial variability in RA along the genome, suggesting that even in ancestries with low overall RA of EUR-derived effects (e.g., African), there are regions with high RA. We substantiate our findings using six complex traits, which show that EUR-derived effects from regions where MC-ANOVA predicts high RA also have high empirical RA in real PGS. We provide software implementing MC-ANOVA and RA maps for several non-EUR ancestries. These maps can be used to interpret similarities and differences in GWAS results between groups and to improve cross-ancestry prediction.</p>","PeriodicalId":19066,"journal":{"name":"Nature Communications","volume":"3 1","pages":""},"PeriodicalIF":14.7000,"publicationDate":"2024-12-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Nature Communications","FirstCategoryId":"103","ListUrlMain":"https://doi.org/10.1038/s41467-024-54727-8","RegionNum":1,"RegionCategory":"综合性期刊","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"MULTIDISCIPLINARY SCIENCES","Score":null,"Total":0}
引用次数: 0
Abstract
The overwhelming majority of participants in genome-wide association studies (GWAS) have European (EUR) ancestry, and polygenic scores (PGS) derived from EURs often perform poorly in non-EURs. Previous studies suggest that between-ancestry differences in allele frequencies and linkage disequilibrium are significant contributors to the poor portability of PGS in cross-ancestry prediction. We hypothesize that the portability of (local) PGS varies significantly over the genome. Therefore, we develop a method, MC-ANOVA, to estimate the loss of accuracy in cross-ancestry prediction attributable to allele frequency and linkage disequilibrium differences between ancestries. Using data from the UK Biobank we develop PGS relative accuracy (RA) maps quantifying the local portability of EUR-derived PGS in non-EUR ancestries. We report substantial variability in RA along the genome, suggesting that even in ancestries with low overall RA of EUR-derived effects (e.g., African), there are regions with high RA. We substantiate our findings using six complex traits, which show that EUR-derived effects from regions where MC-ANOVA predicts high RA also have high empirical RA in real PGS. We provide software implementing MC-ANOVA and RA maps for several non-EUR ancestries. These maps can be used to interpret similarities and differences in GWAS results between groups and to improve cross-ancestry prediction.
期刊介绍:
Nature Communications, an open-access journal, publishes high-quality research spanning all areas of the natural sciences. Papers featured in the journal showcase significant advances relevant to specialists in each respective field. With a 2-year impact factor of 16.6 (2022) and a median time of 8 days from submission to the first editorial decision, Nature Communications is committed to rapid dissemination of research findings. As a multidisciplinary journal, it welcomes contributions from biological, health, physical, chemical, Earth, social, mathematical, applied, and engineering sciences, aiming to highlight important breakthroughs within each domain.