Learning Gaussian Graphical Models from Correlated Data.

IF 2.3
Frontiers in systems biology Pub Date : 2025-01-01 Epub Date: 2025-07-03 DOI:10.3389/fsysb.2025.1589079
Zeyuan Song, Sophia Gunn, Stefano Monti, Gina Marie Peloso, Ching-Ti Liu, Kathryn Lunetta, Paola Sebastiani
{"title":"Learning Gaussian Graphical Models from Correlated Data.","authors":"Zeyuan Song, Sophia Gunn, Stefano Monti, Gina Marie Peloso, Ching-Ti Liu, Kathryn Lunetta, Paola Sebastiani","doi":"10.3389/fsysb.2025.1589079","DOIUrl":null,"url":null,"abstract":"<p><p>Gaussian Graphical Models (GGMs) are a type of network modeling that uses partial correlation rather than correlation for representing complex relationships among multiple variables. The advantage of using partial correlation is to show the relation between two variables after \"adjusting\" for the effects of other variables and leads to more parsimonious and interpretable models. There are well established procedures to build GGMs from a sample of independent and identical distributed observations. However, many studies include clustered and longitudinal data that result in correlated observations and ignoring this correlation among observations can lead to inflated Type I error. In this paper, we propose a cluster-based bootstrap algorithm to infer GGMs from correlated data. We use extensive simulations of correlated data from family-based studies to show that the proposed bootstrap method does not inflate the Type I error while retaining statistical power compared to alternative solutions when there are sufficient number of clusters. We apply our method to learn the GGM that represents complex relations between 47 Polygenic Risk Scores generated using genome-wide genotype data from the Long Life Family Study. By comparing it to the conventional methods that ignore within-cluster correlation, we show that our method controls the Type I error well without power loss.</p>","PeriodicalId":73109,"journal":{"name":"Frontiers in systems biology","volume":"5 ","pages":""},"PeriodicalIF":2.3000,"publicationDate":"2025-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12323441/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Frontiers in systems biology","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.3389/fsysb.2025.1589079","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2025/7/3 0:00:00","PubModel":"Epub","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

Gaussian Graphical Models (GGMs) are a type of network modeling that uses partial correlation rather than correlation for representing complex relationships among multiple variables. The advantage of using partial correlation is to show the relation between two variables after "adjusting" for the effects of other variables and leads to more parsimonious and interpretable models. There are well established procedures to build GGMs from a sample of independent and identical distributed observations. However, many studies include clustered and longitudinal data that result in correlated observations and ignoring this correlation among observations can lead to inflated Type I error. In this paper, we propose a cluster-based bootstrap algorithm to infer GGMs from correlated data. We use extensive simulations of correlated data from family-based studies to show that the proposed bootstrap method does not inflate the Type I error while retaining statistical power compared to alternative solutions when there are sufficient number of clusters. We apply our method to learn the GGM that represents complex relations between 47 Polygenic Risk Scores generated using genome-wide genotype data from the Long Life Family Study. By comparing it to the conventional methods that ignore within-cluster correlation, we show that our method controls the Type I error well without power loss.

从相关数据中学习高斯图形模型。
高斯图形模型(Gaussian Graphical Models, GGMs)是一种网络建模类型,它使用部分相关而不是相关来表示多个变量之间的复杂关系。使用偏相关的优点是在“调整”其他变量的影响后显示两个变量之间的关系,并导致更简洁和可解释的模型。从独立和相同的分布式观测样本中建立ggm有完善的程序。然而,许多研究包括聚类和纵向数据,导致观测结果相关,忽略观测结果之间的这种相关性可能导致I型误差膨胀。在本文中,我们提出了一种基于聚类的自举算法来从相关数据中推断出ggm。我们对基于家庭的研究的相关数据进行了广泛的模拟,以表明当有足够数量的集群时,与替代解决方案相比,所提出的自举方法在保留统计能力的同时不会扩大I型误差。我们应用我们的方法来学习表示47个多基因风险评分之间复杂关系的GGM,这些多基因风险评分是由来自Long Life Family Study的全基因组基因型数据生成的。通过与忽略簇内相关的传统方法进行比较,我们表明我们的方法可以很好地控制I型误差而不会造成功率损失。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信