Testing for differences in polygenic scores in the presence of confounding.

IF 3.3 3区生物学 Q2 GENETICS & HEREDITY

Genetics Pub Date : 2025-06-04 DOI:10.1093/genetics/iyaf071

Jennifer Blanc, Jeremy J Berg

{"title":"Testing for differences in polygenic scores in the presence of confounding.","authors":"Jennifer Blanc, Jeremy J Berg","doi":"10.1093/genetics/iyaf071","DOIUrl":null,"url":null,"abstract":"<p><p>Polygenic scores have become an important tool in human genetics, enabling the prediction of individuals' phenotypes from their genotypes. Understanding how the pattern of differences in polygenic score predictions across individuals intersects with variation in ancestry can provide insights into the evolutionary forces acting on the trait in question and is important for understanding health disparities. However, because most polygenic scores are computed using effect estimates from population samples, they are susceptible to confounding by both genetic and environmental effects that are correlated with ancestry. The extent to which this confounding drives patterns in the distribution of polygenic scores depends on the patterns of population structure in both the original estimation panel and in the prediction/test panel. Here, we use theory from population and statistical genetics, together with simulations, to study the procedure of testing for an association between polygenic scores and axes of ancestry variation in the presence of confounding. We use a general model of genetic relatedness to describe how confounding in the estimation panel biases the distribution of polygenic scores in ways that depends on the degree of overlap in population structure between panels. We then show how this confounding can bias tests for associations between polygenic scores and important axes of ancestry variation in the test panel. Specifically, for any given test, there exists a single axis of population structure in the genome-wide association study (GWAS) panel that needs to be controlled for in order to protect the test. In the context of this result, we study the behavior of multiple approaches to control for stratification along this axis, including standard methods such using principal components as fixed covariates in the GWAS, linear mixed models, and a novel approach for directly estimating the axis using the test panel genotypes. Our analyses highlight the role of estimation noise in the models of population structure as a plausible source of residual confounding in polygenic score analyses.</p>","PeriodicalId":48925,"journal":{"name":"Genetics","volume":" ","pages":""},"PeriodicalIF":3.3000,"publicationDate":"2025-06-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12135188/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Genetics","FirstCategoryId":"99","ListUrlMain":"https://doi.org/10.1093/genetics/iyaf071","RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"GENETICS & HEREDITY","Score":null,"Total":0}

引用次数: 0

Abstract

Polygenic scores have become an important tool in human genetics, enabling the prediction of individuals' phenotypes from their genotypes. Understanding how the pattern of differences in polygenic score predictions across individuals intersects with variation in ancestry can provide insights into the evolutionary forces acting on the trait in question and is important for understanding health disparities. However, because most polygenic scores are computed using effect estimates from population samples, they are susceptible to confounding by both genetic and environmental effects that are correlated with ancestry. The extent to which this confounding drives patterns in the distribution of polygenic scores depends on the patterns of population structure in both the original estimation panel and in the prediction/test panel. Here, we use theory from population and statistical genetics, together with simulations, to study the procedure of testing for an association between polygenic scores and axes of ancestry variation in the presence of confounding. We use a general model of genetic relatedness to describe how confounding in the estimation panel biases the distribution of polygenic scores in ways that depends on the degree of overlap in population structure between panels. We then show how this confounding can bias tests for associations between polygenic scores and important axes of ancestry variation in the test panel. Specifically, for any given test, there exists a single axis of population structure in the genome-wide association study (GWAS) panel that needs to be controlled for in order to protect the test. In the context of this result, we study the behavior of multiple approaches to control for stratification along this axis, including standard methods such using principal components as fixed covariates in the GWAS, linear mixed models, and a novel approach for directly estimating the axis using the test panel genotypes. Our analyses highlight the role of estimation noise in the models of population structure as a plausible source of residual confounding in polygenic score analyses.

查看原文本刊更多论文

在存在混杂的情况下检测多基因得分的差异。

多基因评分已经成为人类遗传学的一个重要工具，可以通过基因型预测个体的表型。了解个体间多基因得分预测的差异模式如何与祖先的差异相交叉，可以深入了解作用于相关性状的进化力量，对理解健康差异也很重要。然而，由于大多数多基因得分是使用群体样本的影响估计来计算的，因此它们容易受到与祖先相关的遗传和环境影响的混淆。这种混杂驱动多基因分数分布模式的程度取决于原始估计小组和预测/测试小组的人口结构模式。在这里，我们使用人口和统计遗传学的理论，连同模拟，来研究在存在混杂的情况下，多基因得分和祖先变异轴之间的关联的测试过程。我们使用遗传相关性的一般模型来描述估计面板中的混淆如何以依赖于面板之间种群结构重叠程度的方式偏倚多基因分数的分布。然后，我们展示了这种混淆如何使测试小组中多基因得分和重要祖先变异轴之间的关联产生偏倚。具体来说，对于任何给定的测试，GWAS面板中存在一个需要控制的人口结构轴，以保护测试。在此结果的背景下，我们研究了沿该轴控制分层的多种方法的行为，包括使用主成分作为固定协变量的标准方法，线性混合模型，以及使用测试组基因型直接估计轴的新方法。我们的分析强调了估计噪声在群体结构模型中的作用，作为多基因评分分析中残留混淆的合理来源。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Genetics GENETICS & HEREDITY-

CiteScore

6.90

自引率

6.10%

发文量

177

审稿时长

1.5 months

期刊介绍： GENETICS is published by the Genetics Society of America, a scholarly society that seeks to deepen our understanding of the living world by advancing our understanding of genetics. Since 1916, GENETICS has published high-quality, original research presenting novel findings bearing on genetics and genomics. The journal publishes empirical studies of organisms ranging from microbes to humans, as well as theoretical work. While it has an illustrious history, GENETICS has changed along with the communities it serves: it is not your mentor''s journal. The editors make decisions quickly – in around 30 days – without sacrificing the excellence and scholarship for which the journal has long been known. GENETICS is a peer reviewed, peer-edited journal, with an international reach and increasing visibility and impact. All editorial decisions are made through collaboration of at least two editors who are practicing scientists. GENETICS is constantly innovating: expanded types of content include Reviews, Commentary (current issues of interest to geneticists), Perspectives (historical), Primers (to introduce primary literature into the classroom), Toolbox Reviews, plus YeastBook, FlyBook, and WormBook (coming spring 2016). For particularly time-sensitive results, we publish Communications. As part of our mission to serve our communities, we''ve published thematic collections, including Genomic Selection, Multiparental Populations, Mouse Collaborative Cross, and the Genetics of Sex.