Phenotype prediction using biologically interpretable neural networks on multi-cohort multi-omics data.

IF 3.5 2区 生物学 Q1 MATHEMATICAL & COMPUTATIONAL BIOLOGY
Arno van Hilten, Jeroen van Rooij, M Arfan Ikram, Wiro J Niessen, Joyce B J van Meurs, Gennady V Roshchupkin
{"title":"Phenotype prediction using biologically interpretable neural networks on multi-cohort multi-omics data.","authors":"Arno van Hilten, Jeroen van Rooij, M Arfan Ikram, Wiro J Niessen, Joyce B J van Meurs, Gennady V Roshchupkin","doi":"10.1038/s41540-024-00405-w","DOIUrl":null,"url":null,"abstract":"<p><p>Integrating multi-omics data into predictive models has the potential to enhance accuracy, which is essential for precision medicine. In this study, we developed interpretable predictive models for multi-omics data by employing neural networks informed by prior biological knowledge, referred to as visible networks. These neural networks offer insights into the decision-making process and can unveil novel perspectives on the underlying biological mechanisms associated with traits and complex diseases. We tested the performance, interpretability and generalizability for inferring smoking status, subject age and LDL levels using genome-wide RNA expression and CpG methylation data from the blood of the BIOS consortium (four population cohorts, N<sub>total</sub> = 2940). In a cohort-wise cross-validation setting, the consistency of the diagnostic performance and interpretation was assessed. Performance was consistently high for predicting smoking status with an overall mean AUC of 0.95 (95% CI: 0.90-1.00) and interpretation revealed the involvement of well-replicated genes such as AHRR, GPR15 and LRRN3. LDL-level predictions were only generalized in a single cohort with an R<sup>2</sup> of 0.07 (95% CI: 0.05-0.08). Age was inferred with a mean error of 5.16 (95% CI: 3.97-6.35) years with the genes COL11A2, AFAP1, OTUD7A, PTPRN2, ADARB2 and CD34 consistently predictive. For both regression tasks, we found that using multi-omics networks improved performance, stability and generalizability compared to interpretable single omic networks. We believe that visible neural networks have great potential for multi-omics analysis; they combine multi-omic data elegantly, are interpretable, and generalize well to data from different cohorts.</p>","PeriodicalId":19345,"journal":{"name":"NPJ Systems Biology and Applications","volume":null,"pages":null},"PeriodicalIF":3.5000,"publicationDate":"2024-08-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11297229/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"NPJ Systems Biology and Applications","FirstCategoryId":"99","ListUrlMain":"https://doi.org/10.1038/s41540-024-00405-w","RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"MATHEMATICAL & COMPUTATIONAL BIOLOGY","Score":null,"Total":0}
引用次数: 0

Abstract

Integrating multi-omics data into predictive models has the potential to enhance accuracy, which is essential for precision medicine. In this study, we developed interpretable predictive models for multi-omics data by employing neural networks informed by prior biological knowledge, referred to as visible networks. These neural networks offer insights into the decision-making process and can unveil novel perspectives on the underlying biological mechanisms associated with traits and complex diseases. We tested the performance, interpretability and generalizability for inferring smoking status, subject age and LDL levels using genome-wide RNA expression and CpG methylation data from the blood of the BIOS consortium (four population cohorts, Ntotal = 2940). In a cohort-wise cross-validation setting, the consistency of the diagnostic performance and interpretation was assessed. Performance was consistently high for predicting smoking status with an overall mean AUC of 0.95 (95% CI: 0.90-1.00) and interpretation revealed the involvement of well-replicated genes such as AHRR, GPR15 and LRRN3. LDL-level predictions were only generalized in a single cohort with an R2 of 0.07 (95% CI: 0.05-0.08). Age was inferred with a mean error of 5.16 (95% CI: 3.97-6.35) years with the genes COL11A2, AFAP1, OTUD7A, PTPRN2, ADARB2 and CD34 consistently predictive. For both regression tasks, we found that using multi-omics networks improved performance, stability and generalizability compared to interpretable single omic networks. We believe that visible neural networks have great potential for multi-omics analysis; they combine multi-omic data elegantly, are interpretable, and generalize well to data from different cohorts.

Abstract Image

在多队列多组学数据上使用生物可解释神经网络进行表型预测。
将多组学数据整合到预测模型中有望提高准确性,这对精准医疗至关重要。在这项研究中,我们通过使用由先前生物知识提供信息的神经网络(称为可见网络),为多组学数据开发了可解释的预测模型。这些神经网络为决策过程提供了洞察力,并能揭示与性状和复杂疾病相关的潜在生物机制的新视角。我们利用 BIOS 联合体(四个人群队列,Ntotal = 2940)血液中的全基因组 RNA 表达和 CpG 甲基化数据,测试了推断吸烟状况、受试者年龄和低密度脂蛋白水平的性能、可解释性和可推广性。在队列交叉验证设置中,对诊断性能和解释的一致性进行了评估。预测吸烟状况的性能一直很高,总平均 AUC 为 0.95(95% CI:0.90-1.00),解释显示 AHRR、GPR15 和 LRRN3 等复制良好的基因参与了预测。低密度脂蛋白水平预测仅在单个队列中具有普遍性,R2 为 0.07(95% CI:0.05-0.08)。年龄推断的平均误差为 5.16(95% CI:3.97-6.35)岁,其中 COL11A2、AFAP1、OTUD7A、PTPRN2、ADARB2 和 CD34 基因始终具有预测性。对于这两项回归任务,我们发现,与可解释的单个 omic 网络相比,使用多组学网络可以提高性能、稳定性和普适性。我们认为,可见神经网络在多组学分析中具有巨大潜力;它们能优雅地结合多组学数据,具有可解释性,并能很好地概括不同组群的数据。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
NPJ Systems Biology and Applications
NPJ Systems Biology and Applications Mathematics-Applied Mathematics
CiteScore
5.80
自引率
0.00%
发文量
46
审稿时长
8 weeks
期刊介绍: npj Systems Biology and Applications is an online Open Access journal dedicated to publishing the premier research that takes a systems-oriented approach. The journal aims to provide a forum for the presentation of articles that help define this nascent field, as well as those that apply the advances to wider fields. We encourage studies that integrate, or aid the integration of, data, analyses and insight from molecules to organisms and broader systems. Important areas of interest include not only fundamental biological systems and drug discovery, but also applications to health, medical practice and implementation, big data, biotechnology, food science, human behaviour, broader biological systems and industrial applications of systems biology. We encourage all approaches, including network biology, application of control theory to biological systems, computational modelling and analysis, comprehensive and/or high-content measurements, theoretical, analytical and computational studies of system-level properties of biological systems and computational/software/data platforms enabling such studies.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信