Joel Mefford, Molly Smullen, Felix Zhang, Michal Sadowski, Richard Border, Andy Dahl, Jonathan Flint, Noah Zaitlen
{"title":"超越预测R2:分位数回归和非等效检验揭示了性状和多基因得分的复杂关系。","authors":"Joel Mefford, Molly Smullen, Felix Zhang, Michal Sadowski, Richard Border, Andy Dahl, Jonathan Flint, Noah Zaitlen","doi":"10.1016/j.ajhg.2025.04.013","DOIUrl":null,"url":null,"abstract":"<p><p>Polygenic scores (PGSs) are genetic predictions of trait values or disease risk that are increasingly finding applications in clinical predictive models and basic genetics research. However, the predictive value of a PGS can vary within similar population groups, depending on characteristics such as the environmental exposures, sex, age, or socioeconomic status of the individuals. To maximize the value of a PGS, approaches to screen trait-PGS pairs for evidence of such heterogeneity without having to specify the relevant exposure or individual characteristics would be useful. Here, in analyses from the UK Biobank, we show that a PGS's predictive accuracy depends on the quantile of the phenotypic distribution to which the PGS is being applied. We quantify differences in predictive value across the phenotypic range using quantile regression linear models to estimate quantile-specific effect sizes for linear models of phenotype values as a function of PGS. Of 25 continuous traits, only three have no quantile-specific effect sizes that varied by at least 1.2-fold from the ordinary least squares estimate. Through simulation, we demonstrate that this heterogeneity of PGS predictive value can arise from gene-by-environment interactions. Our approach can be used to flag traits where the use of PGSs warrants extra caution, and perhaps stratification variables should be sought and used because PGSs perform substantially differently in portions of the sampled population than expected from quoted predictive R<sup>2</sup> or incremental R<sup>2</sup> values that represent average performance across a dataset.</p>","PeriodicalId":7659,"journal":{"name":"American journal of human genetics","volume":"112 6","pages":"1363-1375"},"PeriodicalIF":8.1000,"publicationDate":"2025-06-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12256909/pdf/","citationCount":"0","resultStr":"{\"title\":\"Beyond predictive R<sup>2</sup>: Quantile regression and non-equivalence tests reveal complex relationships of traits and polygenic scores.\",\"authors\":\"Joel Mefford, Molly Smullen, Felix Zhang, Michal Sadowski, Richard Border, Andy Dahl, Jonathan Flint, Noah Zaitlen\",\"doi\":\"10.1016/j.ajhg.2025.04.013\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p><p>Polygenic scores (PGSs) are genetic predictions of trait values or disease risk that are increasingly finding applications in clinical predictive models and basic genetics research. However, the predictive value of a PGS can vary within similar population groups, depending on characteristics such as the environmental exposures, sex, age, or socioeconomic status of the individuals. To maximize the value of a PGS, approaches to screen trait-PGS pairs for evidence of such heterogeneity without having to specify the relevant exposure or individual characteristics would be useful. Here, in analyses from the UK Biobank, we show that a PGS's predictive accuracy depends on the quantile of the phenotypic distribution to which the PGS is being applied. We quantify differences in predictive value across the phenotypic range using quantile regression linear models to estimate quantile-specific effect sizes for linear models of phenotype values as a function of PGS. Of 25 continuous traits, only three have no quantile-specific effect sizes that varied by at least 1.2-fold from the ordinary least squares estimate. Through simulation, we demonstrate that this heterogeneity of PGS predictive value can arise from gene-by-environment interactions. Our approach can be used to flag traits where the use of PGSs warrants extra caution, and perhaps stratification variables should be sought and used because PGSs perform substantially differently in portions of the sampled population than expected from quoted predictive R<sup>2</sup> or incremental R<sup>2</sup> values that represent average performance across a dataset.</p>\",\"PeriodicalId\":7659,\"journal\":{\"name\":\"American journal of human genetics\",\"volume\":\"112 6\",\"pages\":\"1363-1375\"},\"PeriodicalIF\":8.1000,\"publicationDate\":\"2025-06-05\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12256909/pdf/\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"American journal of human genetics\",\"FirstCategoryId\":\"99\",\"ListUrlMain\":\"https://doi.org/10.1016/j.ajhg.2025.04.013\",\"RegionNum\":1,\"RegionCategory\":\"生物学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"GENETICS & HEREDITY\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"American journal of human genetics","FirstCategoryId":"99","ListUrlMain":"https://doi.org/10.1016/j.ajhg.2025.04.013","RegionNum":1,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"GENETICS & HEREDITY","Score":null,"Total":0}
Beyond predictive R2: Quantile regression and non-equivalence tests reveal complex relationships of traits and polygenic scores.
Polygenic scores (PGSs) are genetic predictions of trait values or disease risk that are increasingly finding applications in clinical predictive models and basic genetics research. However, the predictive value of a PGS can vary within similar population groups, depending on characteristics such as the environmental exposures, sex, age, or socioeconomic status of the individuals. To maximize the value of a PGS, approaches to screen trait-PGS pairs for evidence of such heterogeneity without having to specify the relevant exposure or individual characteristics would be useful. Here, in analyses from the UK Biobank, we show that a PGS's predictive accuracy depends on the quantile of the phenotypic distribution to which the PGS is being applied. We quantify differences in predictive value across the phenotypic range using quantile regression linear models to estimate quantile-specific effect sizes for linear models of phenotype values as a function of PGS. Of 25 continuous traits, only three have no quantile-specific effect sizes that varied by at least 1.2-fold from the ordinary least squares estimate. Through simulation, we demonstrate that this heterogeneity of PGS predictive value can arise from gene-by-environment interactions. Our approach can be used to flag traits where the use of PGSs warrants extra caution, and perhaps stratification variables should be sought and used because PGSs perform substantially differently in portions of the sampled population than expected from quoted predictive R2 or incremental R2 values that represent average performance across a dataset.
期刊介绍:
The American Journal of Human Genetics (AJHG) is a monthly journal published by Cell Press, chosen by The American Society of Human Genetics (ASHG) as its premier publication starting from January 2008. AJHG represents Cell Press's first society-owned journal, and both ASHG and Cell Press anticipate significant synergies between AJHG content and that of other Cell Press titles.