Simultaneous Representation Learning of Multi-Omics and Clinical Outcome Data via a Supervised Knowledge-Guided Bayesian Factor Model.

IF 1.8 4区 医学 Q3 MATHEMATICAL & COMPUTATIONAL BIOLOGY
Qiyiwen Zhang, Changgee Chang, Chong Jin, Li Shen, Qi Long
{"title":"Simultaneous Representation Learning of Multi-Omics and Clinical Outcome Data via a Supervised Knowledge-Guided Bayesian Factor Model.","authors":"Qiyiwen Zhang, Changgee Chang, Chong Jin, Li Shen, Qi Long","doi":"10.1002/sim.70570","DOIUrl":null,"url":null,"abstract":"<p><p>With the advent of high-throughput techniques, multi-omics data and various clinical outcomes have been collected for a range of diseases. Multi-omics data play a crucial role in uncovering complex biological processes, yet simultaneous representation learning of such high-dimensional, heterogeneous multi-modality data along with clinical outcomes remains limited. To address this gap, we propose a supervised knowledge-guided Bayesian factor model for integrative analysis of multi-omics and clinical outcome data. The proposed method simultaneously extracts an informative low-dimensional representation and predicts one or more clinical outcomes of interest. The two-level adaptive shrinkage in the novel hierarchical priors allows for the identification of both active modalities and features, resulting in a biologically meaningful structural identification of the high-dimensional data. Moreover, the method is robust to noisy edges in biological graphs that do not align with ground truth. Finally, the proposed method can handle different data types including both continuous and categorical data. Extensive simulation studies and real data analyses of Alzheimer's disease (AD) data demonstrate the advantages of the proposed approach over existing methods. Notably, our analysis of multi-omics and imaging phenotype data from ADNI provides meaningful insights into the underlying biological mechanisms of AD.</p>","PeriodicalId":21879,"journal":{"name":"Statistics in Medicine","volume":"45 10-12","pages":"e70570"},"PeriodicalIF":1.8000,"publicationDate":"2026-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC13110451/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Statistics in Medicine","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1002/sim.70570","RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"MATHEMATICAL & COMPUTATIONAL BIOLOGY","Score":null,"Total":0}
引用次数: 0

Abstract

With the advent of high-throughput techniques, multi-omics data and various clinical outcomes have been collected for a range of diseases. Multi-omics data play a crucial role in uncovering complex biological processes, yet simultaneous representation learning of such high-dimensional, heterogeneous multi-modality data along with clinical outcomes remains limited. To address this gap, we propose a supervised knowledge-guided Bayesian factor model for integrative analysis of multi-omics and clinical outcome data. The proposed method simultaneously extracts an informative low-dimensional representation and predicts one or more clinical outcomes of interest. The two-level adaptive shrinkage in the novel hierarchical priors allows for the identification of both active modalities and features, resulting in a biologically meaningful structural identification of the high-dimensional data. Moreover, the method is robust to noisy edges in biological graphs that do not align with ground truth. Finally, the proposed method can handle different data types including both continuous and categorical data. Extensive simulation studies and real data analyses of Alzheimer's disease (AD) data demonstrate the advantages of the proposed approach over existing methods. Notably, our analysis of multi-omics and imaging phenotype data from ADNI provides meaningful insights into the underlying biological mechanisms of AD.

基于监督知识引导贝叶斯因子模型的多组学和临床结果数据的同步表示学习。
随着高通量技术的出现,已经收集了一系列疾病的多组学数据和各种临床结果。多组学数据在揭示复杂的生物过程中发挥着至关重要的作用,然而,这种高维、异构的多模态数据与临床结果的同时表征学习仍然有限。为了解决这一差距,我们提出了一个监督知识引导的贝叶斯因子模型,用于多组学和临床结果数据的综合分析。提出的方法同时提取信息丰富的低维表示并预测一个或多个感兴趣的临床结果。新的分层先验中的两级自适应收缩允许识别活动模式和特征,从而对高维数据进行具有生物学意义的结构识别。此外,该方法对生物图中与真值不一致的噪声边缘具有鲁棒性。最后,该方法可以处理不同类型的数据,包括连续数据和分类数据。广泛的模拟研究和阿尔茨海默病(AD)数据的实际数据分析表明,所提出的方法优于现有方法。值得注意的是,我们对ADNI的多组学和成像表型数据的分析为AD的潜在生物学机制提供了有意义的见解。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
Statistics in Medicine
Statistics in Medicine 医学-公共卫生、环境卫生与职业卫生
CiteScore
3.40
自引率
10.00%
发文量
334
审稿时长
2-4 weeks
期刊介绍: The journal aims to influence practice in medicine and its associated sciences through the publication of papers on statistical and other quantitative methods. Papers will explain new methods and demonstrate their application, preferably through a substantive, real, motivating example or a comprehensive evaluation based on an illustrative example. Alternatively, papers will report on case-studies where creative use or technical generalizations of established methodology is directed towards a substantive application. Reviews of, and tutorials on, general topics relevant to the application of statistics to medicine will also be published. The main criteria for publication are appropriateness of the statistical methods to a particular medical problem and clarity of exposition. Papers with primarily mathematical content will be excluded. The journal aims to enhance communication between statisticians, clinicians and medical researchers.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信
小红书