{"title":"REMOVING PLEIOTROPIC SIGNALS REVEAL DISEASE-SPECIFIC GENETIC ARCHITECTURE IN NOISY, SHALLOW BIOBANK PHENOTYPES","authors":"Hyunkyung Kim , Na Cai , Andy Dahl","doi":"10.1016/j.euroneuro.2025.08.547","DOIUrl":null,"url":null,"abstract":"<div><div>Pleiotropy is pervasive in complex traits, and understanding it is necessary to characterize shared vs specific genetic effects. Specific effects point to the core biology of a trait, which is especially challenging to characterize in heterogeneous traits such as major depressive disorder (MDD). Exploiting shared effects, on the other hand, can improve statistical power to detect genetic effects and exploit them for polygenic prediction. Large multi-trait genetic datasets, like the UK Biobank, provide opportunities to jointly model these shared and specific effects across thousands of related traits.</div><div>However, the standard approach to understand pleiotropy–genetic correlation–is overly simplistic as it only captures genome-wide aggregate similarity. While more recent approaches have extended genetic correlation to locus-level measures or factor models spanning many traits, it remains challenging to separate trait-specific effects from those that are broadly shared across related phenotypes. For example, genetic effects on alcohol use, and neuroticism will affect MDD, yet they are not specific to MDD nor likely to shed light on its core etiology. Here, we develop a Bayesian matrix factorization approach to address these limitations by partitioning high-dimensional pleiotropic relationships into effects that are shared vs specific to a focal trait of interest.</div><div>First, we applied our approach to simulated data to demonstrate it can reliably separate genetic effects that are specific to a trait vs that are mediated through secondary traits. Our approach outperforms other factorization-based approaches, such as conditioning on phenome-wide PCs. We then applied our approach to identify MDD-specific genetic effects in UK Biobank by accounting for shared genetic effects across 216 MDD-relevant traits. Specifically, we excluded the best-available measure, LifetimeMDD, and evaluated our ability to recapitulate this measure from two lower-quality measures, a GP-based measure and ICD10-based depression. We first show that our approach yields more specific phenotypes, which are more correlated to LifetimeMDD (R2s increase from 0.551 and 0.272 to 0.634 for the GP and ICD10 measures, respectively). Next, we showed that our approach yields better polygenic scores to predict LifetimeMDD (R2s increase from 0.081 and 0.035 to 0.097 for the GP and ICD10 measures, respectively; both p_bootstrap < .01).</div><div>Overall, our approach can be applied to any large-scale, noisy biobank phenotypes to improve their disorder-specificity. This is an important step toward bridging the gap between carefully-phenotyped datasets and shallowly-phenotyped datasets, which is essential for deriving powerful and specific genetic associations in complex traits.</div></div>","PeriodicalId":12049,"journal":{"name":"European Neuropsychopharmacology","volume":"99 ","pages":"Page 45"},"PeriodicalIF":6.7000,"publicationDate":"2025-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"European Neuropsychopharmacology","FirstCategoryId":"3","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0924977X25007059","RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"CLINICAL NEUROLOGY","Score":null,"Total":0}
引用次数: 0
Abstract
Pleiotropy is pervasive in complex traits, and understanding it is necessary to characterize shared vs specific genetic effects. Specific effects point to the core biology of a trait, which is especially challenging to characterize in heterogeneous traits such as major depressive disorder (MDD). Exploiting shared effects, on the other hand, can improve statistical power to detect genetic effects and exploit them for polygenic prediction. Large multi-trait genetic datasets, like the UK Biobank, provide opportunities to jointly model these shared and specific effects across thousands of related traits.
However, the standard approach to understand pleiotropy–genetic correlation–is overly simplistic as it only captures genome-wide aggregate similarity. While more recent approaches have extended genetic correlation to locus-level measures or factor models spanning many traits, it remains challenging to separate trait-specific effects from those that are broadly shared across related phenotypes. For example, genetic effects on alcohol use, and neuroticism will affect MDD, yet they are not specific to MDD nor likely to shed light on its core etiology. Here, we develop a Bayesian matrix factorization approach to address these limitations by partitioning high-dimensional pleiotropic relationships into effects that are shared vs specific to a focal trait of interest.
First, we applied our approach to simulated data to demonstrate it can reliably separate genetic effects that are specific to a trait vs that are mediated through secondary traits. Our approach outperforms other factorization-based approaches, such as conditioning on phenome-wide PCs. We then applied our approach to identify MDD-specific genetic effects in UK Biobank by accounting for shared genetic effects across 216 MDD-relevant traits. Specifically, we excluded the best-available measure, LifetimeMDD, and evaluated our ability to recapitulate this measure from two lower-quality measures, a GP-based measure and ICD10-based depression. We first show that our approach yields more specific phenotypes, which are more correlated to LifetimeMDD (R2s increase from 0.551 and 0.272 to 0.634 for the GP and ICD10 measures, respectively). Next, we showed that our approach yields better polygenic scores to predict LifetimeMDD (R2s increase from 0.081 and 0.035 to 0.097 for the GP and ICD10 measures, respectively; both p_bootstrap < .01).
Overall, our approach can be applied to any large-scale, noisy biobank phenotypes to improve their disorder-specificity. This is an important step toward bridging the gap between carefully-phenotyped datasets and shallowly-phenotyped datasets, which is essential for deriving powerful and specific genetic associations in complex traits.
期刊介绍:
European Neuropsychopharmacology is the official publication of the European College of Neuropsychopharmacology (ECNP). In accordance with the mission of the College, the journal focuses on clinical and basic science contributions that advance our understanding of brain function and human behaviour and enable translation into improved treatments and enhanced public health impact in psychiatry. Recent years have been characterized by exciting advances in basic knowledge and available experimental techniques in neuroscience and genomics. However, clinical translation of these findings has not been as rapid. The journal aims to narrow this gap by promoting findings that are expected to have a major impact on both our understanding of the biological bases of mental disorders and the development and improvement of treatments, ideally paving the way for prevention and recovery.