{"title":"Learning Gaussian Graphical Models from Correlated Data.","authors":"Zeyuan Song, Sophia Gunn, Stefano Monti, Gina Marie Peloso, Ching-Ti Liu, Kathryn Lunetta, Paola Sebastiani","doi":"10.3389/fsysb.2025.1589079","DOIUrl":null,"url":null,"abstract":"<p><p>Gaussian Graphical Models (GGMs) are a type of network modeling that uses partial correlation rather than correlation for representing complex relationships among multiple variables. The advantage of using partial correlation is to show the relation between two variables after \"adjusting\" for the effects of other variables and leads to more parsimonious and interpretable models. There are well established procedures to build GGMs from a sample of independent and identical distributed observations. However, many studies include clustered and longitudinal data that result in correlated observations and ignoring this correlation among observations can lead to inflated Type I error. In this paper, we propose a cluster-based bootstrap algorithm to infer GGMs from correlated data. We use extensive simulations of correlated data from family-based studies to show that the proposed bootstrap method does not inflate the Type I error while retaining statistical power compared to alternative solutions when there are sufficient number of clusters. We apply our method to learn the GGM that represents complex relations between 47 Polygenic Risk Scores generated using genome-wide genotype data from the Long Life Family Study. By comparing it to the conventional methods that ignore within-cluster correlation, we show that our method controls the Type I error well without power loss.</p>","PeriodicalId":73109,"journal":{"name":"Frontiers in systems biology","volume":"5 ","pages":""},"PeriodicalIF":2.3000,"publicationDate":"2025-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12323441/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Frontiers in systems biology","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.3389/fsysb.2025.1589079","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2025/7/3 0:00:00","PubModel":"Epub","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Gaussian Graphical Models (GGMs) are a type of network modeling that uses partial correlation rather than correlation for representing complex relationships among multiple variables. The advantage of using partial correlation is to show the relation between two variables after "adjusting" for the effects of other variables and leads to more parsimonious and interpretable models. There are well established procedures to build GGMs from a sample of independent and identical distributed observations. However, many studies include clustered and longitudinal data that result in correlated observations and ignoring this correlation among observations can lead to inflated Type I error. In this paper, we propose a cluster-based bootstrap algorithm to infer GGMs from correlated data. We use extensive simulations of correlated data from family-based studies to show that the proposed bootstrap method does not inflate the Type I error while retaining statistical power compared to alternative solutions when there are sufficient number of clusters. We apply our method to learn the GGM that represents complex relations between 47 Polygenic Risk Scores generated using genome-wide genotype data from the Long Life Family Study. By comparing it to the conventional methods that ignore within-cluster correlation, we show that our method controls the Type I error well without power loss.
高斯图形模型(Gaussian Graphical Models, GGMs)是一种网络建模类型,它使用部分相关而不是相关来表示多个变量之间的复杂关系。使用偏相关的优点是在“调整”其他变量的影响后显示两个变量之间的关系,并导致更简洁和可解释的模型。从独立和相同的分布式观测样本中建立ggm有完善的程序。然而,许多研究包括聚类和纵向数据,导致观测结果相关,忽略观测结果之间的这种相关性可能导致I型误差膨胀。在本文中,我们提出了一种基于聚类的自举算法来从相关数据中推断出ggm。我们对基于家庭的研究的相关数据进行了广泛的模拟,以表明当有足够数量的集群时,与替代解决方案相比,所提出的自举方法在保留统计能力的同时不会扩大I型误差。我们应用我们的方法来学习表示47个多基因风险评分之间复杂关系的GGM,这些多基因风险评分是由来自Long Life Family Study的全基因组基因型数据生成的。通过与忽略簇内相关的传统方法进行比较,我们表明我们的方法可以很好地控制I型误差而不会造成功率损失。