Graph-guided Bayesian Factor Model for Integrative Analysis of Multi-modal Data with Noisy Network Information.

IF 0.4 Q4 MATHEMATICAL & COMPUTATIONAL BIOLOGY

Statistics in Biosciences Pub Date : 2024-08-11 DOI:10.1007/s12561-024-09452-7

Wenrui Li, Qiyiwen Zhang, Kewen Qu, Qi Long

{"title":"Graph-guided Bayesian Factor Model for Integrative Analysis of Multi-modal Data with Noisy Network Information.","authors":"Wenrui Li, Qiyiwen Zhang, Kewen Qu, Qi Long","doi":"10.1007/s12561-024-09452-7","DOIUrl":null,"url":null,"abstract":"<p><p>There is a growing body of literature on factor analysis that can capture individual and shared structures in multi-modal data. However, few of these approaches incorporate biological knowledge such as functional genomics and functional metabolomics. Graph-guided statistical learning methods that can incorporate knowledge of underlying networks have been shown to improve predication and classification accuracy, and yield more interpretable results. Moreover, these methods typically use graphs extracted from existing databases or rely on subject matter expertise which are known to be incomplete and may contain false edges. To address this gap, we propose a graph-guided Bayesian factor model that can account for network noise and identify globally shared, partially shared and modality-specific latent factors in multimodal data. Specifically, we use two sources of network information, including the noisy graph extracted from existing databases and the estimated graph from observed features in the dataset at hand, to inform the model for the true underlying network via a latent scale modeling framework. This model is coupled with the Bayesian factor analysis model with shrinkage priors to encourage feature-wise and modal-wise sparsity, thereby allowing feature selection and identification of factors of each type. We develop an efficient Markov chain Monte Carlo algorithm for posterior sampling. We demonstrate the advantages of our method over existing methods in simulations, and through analyses of gene expression and metabolomics datasets for Alzheimer's disease.</p>","PeriodicalId":45094,"journal":{"name":"Statistics in Biosciences","volume":" ","pages":""},"PeriodicalIF":0.4000,"publicationDate":"2024-08-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12221265/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Statistics in Biosciences","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1007/s12561-024-09452-7","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q4","JCRName":"MATHEMATICAL & COMPUTATIONAL BIOLOGY","Score":null,"Total":0}

引用次数: 0

Abstract

There is a growing body of literature on factor analysis that can capture individual and shared structures in multi-modal data. However, few of these approaches incorporate biological knowledge such as functional genomics and functional metabolomics. Graph-guided statistical learning methods that can incorporate knowledge of underlying networks have been shown to improve predication and classification accuracy, and yield more interpretable results. Moreover, these methods typically use graphs extracted from existing databases or rely on subject matter expertise which are known to be incomplete and may contain false edges. To address this gap, we propose a graph-guided Bayesian factor model that can account for network noise and identify globally shared, partially shared and modality-specific latent factors in multimodal data. Specifically, we use two sources of network information, including the noisy graph extracted from existing databases and the estimated graph from observed features in the dataset at hand, to inform the model for the true underlying network via a latent scale modeling framework. This model is coupled with the Bayesian factor analysis model with shrinkage priors to encourage feature-wise and modal-wise sparsity, thereby allowing feature selection and identification of factors of each type. We develop an efficient Markov chain Monte Carlo algorithm for posterior sampling. We demonstrate the advantages of our method over existing methods in simulations, and through analyses of gene expression and metabolomics datasets for Alzheimer's disease.

查看原文本刊更多论文

带噪声网络信息的多模态数据综合分析的图导贝叶斯因子模型。

关于因子分析的文献越来越多，它可以捕获多模态数据中的个体和共享结构。然而，这些方法很少结合生物学知识，如功能基因组学和功能代谢组学。图引导的统计学习方法可以结合底层网络的知识，已被证明可以提高预测和分类的准确性，并产生更多可解释的结果。此外，这些方法通常使用从现有数据库中提取的图形，或者依赖于已知不完整且可能包含假边的主题专业知识。为了解决这一差距，我们提出了一个图引导的贝叶斯因素模型，该模型可以考虑网络噪声，并识别多模态数据中全局共享、部分共享和特定于模态的潜在因素。具体来说，我们使用两种网络信息来源，包括从现有数据库中提取的噪声图和从手头数据集中观察到的特征中估计的图，通过潜在尺度建模框架告知模型真实的底层网络。该模型与具有收缩先验的贝叶斯因子分析模型相结合，以鼓励特征和模式稀疏性，从而允许每种类型的因素的特征选择和识别。提出了一种有效的后验抽样马尔可夫链蒙特卡罗算法。我们通过对阿尔茨海默病的基因表达和代谢组学数据集的分析，证明了我们的方法在模拟中优于现有方法的优势。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Statistics in Biosciences MATHEMATICAL & COMPUTATIONAL BIOLOGY-

CiteScore

2.00

自引率

0.00%

发文量

期刊介绍： Statistics in Biosciences (SIBS) is published three times a year in print and electronic form. It aims at development and application of statistical methods and their interface with other quantitative methods, such as computational and mathematical methods, in biological and life science, health science, and biopharmaceutical and biotechnological science. SIBS publishes scientific papers and review articles in four sections, with the first two sections as the primary sections. Original Articles publish novel statistical and quantitative methods in biosciences. The Bioscience Case Studies and Practice Articles publish papers that advance statistical practice in biosciences, such as case studies, innovative applications of existing methods that further understanding of subject-matter science, evaluation of existing methods and data sources. Review Articles publish papers that review an area of statistical and quantitative methodology, software, and data sources in biosciences. Commentaries provide perspectives of research topics or policy issues that are of current quantitative interest in biosciences, reactions to an article published in the journal, and scholarly essays. Substantive science is essential in motivating and demonstrating the methodological development and use for an article to be acceptable. Articles published in SIBS share the goal of promoting evidence-based real world practice and policy making through effective and timely interaction and communication of statisticians and quantitative researchers with subject-matter scientists in biosciences.