Evaluation of Bayesian Linear Regression models for gene set prioritization in complex diseases.

IF 4 2区 生物学 Q1 GENETICS & HEREDITY
Tahereh Gholipourshahraki, Zhonghao Bai, Merina Shrestha, Astrid Hjelholt, Sile Hu, Mads Kjolby, Palle Duun Rohde, Peter Sørensen
{"title":"Evaluation of Bayesian Linear Regression models for gene set prioritization in complex diseases.","authors":"Tahereh Gholipourshahraki, Zhonghao Bai, Merina Shrestha, Astrid Hjelholt, Sile Hu, Mads Kjolby, Palle Duun Rohde, Peter Sørensen","doi":"10.1371/journal.pgen.1011463","DOIUrl":null,"url":null,"abstract":"<p><p>Genome-wide association studies (GWAS) provide valuable insights into the genetic architecture of complex traits, yet interpreting their results remains challenging due to the polygenic nature of most traits. Gene set analysis offers a solution by aggregating genetic variants into biologically relevant pathways, enhancing the detection of coordinated effects across multiple genes. In this study, we present and evaluate a gene set prioritization approach utilizing Bayesian Linear Regression (BLR) models to uncover shared genetic components among different phenotypes and facilitate biological interpretation. Through extensive simulations and analyses of real traits, we demonstrate the efficacy of the BLR model in prioritizing pathways for complex traits. Simulation studies reveal insights into the model's performance under various scenarios, highlighting the impact of factors such as the number of causal genes, proportions of causal variants, heritability, and disease prevalence. Comparative analyses with MAGMA (Multi-marker Analysis of GenoMic Annotation) demonstrate BLR's superior performance, especially in highly overlapped gene sets. Application of both single-trait and multi-trait BLR models to real data, specifically GWAS summary data for type 2 diabetes (T2D) and related phenotypes, identifies significant associations with T2D-related pathways. Furthermore, comparison between single- and multi-trait BLR analyses highlights the superior performance of the multi-trait approach in identifying associated pathways, showcasing increased statistical power when analyzing multiple traits jointly. Additionally, enrichment analysis with integrated data from various public resources supports our results, confirming significant enrichment of diabetes-related genes within the top T2D pathways resulting from the multi-trait analysis. The BLR model's ability to handle diverse genomic features, perform regularization, conduct variable selection, and integrate information from multiple traits, genders, and ancestries demonstrates its utility in understanding the genetic architecture of complex traits. Our study provides insights into the potential of the BLR model to prioritize gene sets, offering a flexible framework applicable to various datasets. This model presents opportunities for advancing personalized medicine by exploring the genetic underpinnings of multifactorial traits.</p>","PeriodicalId":49007,"journal":{"name":"PLoS Genetics","volume":null,"pages":null},"PeriodicalIF":4.0000,"publicationDate":"2024-11-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"PLoS Genetics","FirstCategoryId":"99","ListUrlMain":"https://doi.org/10.1371/journal.pgen.1011463","RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"GENETICS & HEREDITY","Score":null,"Total":0}
引用次数: 0

Abstract

Genome-wide association studies (GWAS) provide valuable insights into the genetic architecture of complex traits, yet interpreting their results remains challenging due to the polygenic nature of most traits. Gene set analysis offers a solution by aggregating genetic variants into biologically relevant pathways, enhancing the detection of coordinated effects across multiple genes. In this study, we present and evaluate a gene set prioritization approach utilizing Bayesian Linear Regression (BLR) models to uncover shared genetic components among different phenotypes and facilitate biological interpretation. Through extensive simulations and analyses of real traits, we demonstrate the efficacy of the BLR model in prioritizing pathways for complex traits. Simulation studies reveal insights into the model's performance under various scenarios, highlighting the impact of factors such as the number of causal genes, proportions of causal variants, heritability, and disease prevalence. Comparative analyses with MAGMA (Multi-marker Analysis of GenoMic Annotation) demonstrate BLR's superior performance, especially in highly overlapped gene sets. Application of both single-trait and multi-trait BLR models to real data, specifically GWAS summary data for type 2 diabetes (T2D) and related phenotypes, identifies significant associations with T2D-related pathways. Furthermore, comparison between single- and multi-trait BLR analyses highlights the superior performance of the multi-trait approach in identifying associated pathways, showcasing increased statistical power when analyzing multiple traits jointly. Additionally, enrichment analysis with integrated data from various public resources supports our results, confirming significant enrichment of diabetes-related genes within the top T2D pathways resulting from the multi-trait analysis. The BLR model's ability to handle diverse genomic features, perform regularization, conduct variable selection, and integrate information from multiple traits, genders, and ancestries demonstrates its utility in understanding the genetic architecture of complex traits. Our study provides insights into the potential of the BLR model to prioritize gene sets, offering a flexible framework applicable to various datasets. This model presents opportunities for advancing personalized medicine by exploring the genetic underpinnings of multifactorial traits.

贝叶斯线性回归模型对复杂疾病基因集优先级排序的评估。
全基因组关联研究(GWAS)为了解复杂性状的遗传结构提供了宝贵的见解,但由于大多数性状具有多基因性,解释其结果仍然具有挑战性。基因组分析提供了一种解决方案,它将基因变异聚合到生物学相关的途径中,从而提高了对跨多个基因的协调效应的检测。在本研究中,我们介绍并评估了一种利用贝叶斯线性回归(BLR)模型进行基因组优先排序的方法,以发现不同表型之间的共有遗传成分并促进生物学解释。通过对实际性状的大量模拟和分析,我们证明了贝叶斯线性回归模型在确定复杂性状通路优先顺序方面的功效。模拟研究揭示了该模型在各种情况下的表现,突出了因果基因数量、因果变异比例、遗传率和疾病流行率等因素的影响。与 MAGMA(Multi-marker Analysis of GenoMic Annotation)的比较分析表明,BLR 性能优越,尤其是在高度重叠的基因集中。将单性状和多性状 BLR 模型应用于真实数据,特别是 2 型糖尿病(T2D)和相关表型的 GWAS 总结数据,发现了与 T2D 相关通路的显著关联。此外,单性状和多性状 BLR 分析之间的比较凸显了多性状方法在确定相关通路方面的优越性能,显示了联合分析多个性状时统计能力的提高。此外,利用来自各种公共资源的综合数据进行的富集分析也支持我们的结果,证实了多性状分析得出的顶级 T2D 通路中糖尿病相关基因的显著富集。BLR 模型能够处理不同的基因组特征、执行正则化、进行变量选择以及整合来自多个性状、性别和祖先的信息,这证明了它在理解复杂性状的遗传结构方面的实用性。我们的研究深入揭示了 BLR 模型确定基因集优先次序的潜力,提供了一个适用于各种数据集的灵活框架。该模型通过探索多因素性状的遗传基础,为推进个性化医疗提供了机会。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
PLoS Genetics
PLoS Genetics GENETICS & HEREDITY-
自引率
2.20%
发文量
438
期刊介绍: PLOS Genetics is run by an international Editorial Board, headed by the Editors-in-Chief, Greg Barsh (HudsonAlpha Institute of Biotechnology, and Stanford University School of Medicine) and Greg Copenhaver (The University of North Carolina at Chapel Hill). Articles published in PLOS Genetics are archived in PubMed Central and cited in PubMed.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信