Gene expression inference based on graph neural networks using L1000 data.

IF 6.8 2区 生物学 Q1 BIOCHEMICAL RESEARCH METHODS
Tae Hyun Kim, Harim Kim, Hyunjin Hwang, Shinwhan Kang, Kijung Shin, Inwha Baek
{"title":"Gene expression inference based on graph neural networks using L1000 data.","authors":"Tae Hyun Kim, Harim Kim, Hyunjin Hwang, Shinwhan Kang, Kijung Shin, Inwha Baek","doi":"10.1093/bib/bbaf273","DOIUrl":null,"url":null,"abstract":"<p><p>Gene expression profiles can serve as proxies for cellular states and provide valuable insights into the discovery of functional connections across diverse cellular contexts. A cost-effective method called L1000 has been developed to generate gene expression profiles for over a million different conditions. Since gene expression inference of this method relies on linear regression, nonlinear regression methods, including deep learning models, have been assessed. However, these approaches process gene expression data as a vector structure, motivating us to investigate whether nonlinear models based on a graph structure are more effective in capturing the relationships between genes underlying gene expression profiles. In this work, we show that the graph neural network (GNN) model with genes as nodes outperforms both linear and nonlinear non-GNN models in predicting gene expression values and expression-based gene rankings. Importantly, our GNN model requires ~10-fold less information than other models to achieve comparable performance. A strategic selection of input features, or incorporating an organ feature, from which the gene expression data are derived, further improves gene expression inference performance of the GNN model. Additionally, we evaluate the cross-platform generality of gene expression inference. Our study demonstrates that the transformation of RNA expression data into a graph structure effectively captures nonlinear correlations between genes, thereby enabling highly accurate and efficient prediction of gene expression profiles.</p>","PeriodicalId":9209,"journal":{"name":"Briefings in bioinformatics","volume":"26 3","pages":""},"PeriodicalIF":6.8000,"publicationDate":"2025-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12161499/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Briefings in bioinformatics","FirstCategoryId":"99","ListUrlMain":"https://doi.org/10.1093/bib/bbaf273","RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"BIOCHEMICAL RESEARCH METHODS","Score":null,"Total":0}
引用次数: 0

Abstract

Gene expression profiles can serve as proxies for cellular states and provide valuable insights into the discovery of functional connections across diverse cellular contexts. A cost-effective method called L1000 has been developed to generate gene expression profiles for over a million different conditions. Since gene expression inference of this method relies on linear regression, nonlinear regression methods, including deep learning models, have been assessed. However, these approaches process gene expression data as a vector structure, motivating us to investigate whether nonlinear models based on a graph structure are more effective in capturing the relationships between genes underlying gene expression profiles. In this work, we show that the graph neural network (GNN) model with genes as nodes outperforms both linear and nonlinear non-GNN models in predicting gene expression values and expression-based gene rankings. Importantly, our GNN model requires ~10-fold less information than other models to achieve comparable performance. A strategic selection of input features, or incorporating an organ feature, from which the gene expression data are derived, further improves gene expression inference performance of the GNN model. Additionally, we evaluate the cross-platform generality of gene expression inference. Our study demonstrates that the transformation of RNA expression data into a graph structure effectively captures nonlinear correlations between genes, thereby enabling highly accurate and efficient prediction of gene expression profiles.

基于L1000数据的图神经网络基因表达推断。
基因表达谱可以作为细胞状态的代理,并为发现不同细胞背景下的功能联系提供有价值的见解。一种被称为L1000的经济有效的方法已经被开发出来,可以生成一百多万种不同情况下的基因表达谱。由于该方法的基因表达推断依赖于线性回归,因此对包括深度学习模型在内的非线性回归方法进行了评估。然而,这些方法将基因表达数据作为矢量结构处理,这促使我们研究基于图结构的非线性模型是否更有效地捕获基因表达谱背后的基因之间的关系。在这项工作中,我们证明了以基因为节点的图神经网络(GNN)模型在预测基因表达值和基于表达的基因排名方面优于线性和非线性非GNN模型。重要的是,我们的GNN模型需要比其他模型少10倍的信息来达到相当的性能。输入特征的策略性选择,或纳入基因表达数据的器官特征,进一步提高了GNN模型的基因表达推断性能。此外,我们还评估了基因表达推断的跨平台普遍性。我们的研究表明,将RNA表达数据转化为图形结构有效地捕获了基因之间的非线性相关性,从而能够高度准确和有效地预测基因表达谱。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
Briefings in bioinformatics
Briefings in bioinformatics 生物-生化研究方法
CiteScore
13.20
自引率
13.70%
发文量
549
审稿时长
6 months
期刊介绍: Briefings in Bioinformatics is an international journal serving as a platform for researchers and educators in the life sciences. It also appeals to mathematicians, statisticians, and computer scientists applying their expertise to biological challenges. The journal focuses on reviews tailored for users of databases and analytical tools in contemporary genetics, molecular and systems biology. It stands out by offering practical assistance and guidance to non-specialists in computerized methodologies. Covering a wide range from introductory concepts to specific protocols and analyses, the papers address bacterial, plant, fungal, animal, and human data. The journal's detailed subject areas include genetic studies of phenotypes and genotypes, mapping, DNA sequencing, expression profiling, gene expression studies, microarrays, alignment methods, protein profiles and HMMs, lipids, metabolic and signaling pathways, structure determination and function prediction, phylogenetic studies, and education and training.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信