Tae Hyun Kim, Harim Kim, Hyunjin Hwang, Shinwhan Kang, Kijung Shin, Inwha Baek
{"title":"Gene expression inference based on graph neural networks using L1000 data.","authors":"Tae Hyun Kim, Harim Kim, Hyunjin Hwang, Shinwhan Kang, Kijung Shin, Inwha Baek","doi":"10.1093/bib/bbaf273","DOIUrl":null,"url":null,"abstract":"<p><p>Gene expression profiles can serve as proxies for cellular states and provide valuable insights into the discovery of functional connections across diverse cellular contexts. A cost-effective method called L1000 has been developed to generate gene expression profiles for over a million different conditions. Since gene expression inference of this method relies on linear regression, nonlinear regression methods, including deep learning models, have been assessed. However, these approaches process gene expression data as a vector structure, motivating us to investigate whether nonlinear models based on a graph structure are more effective in capturing the relationships between genes underlying gene expression profiles. In this work, we show that the graph neural network (GNN) model with genes as nodes outperforms both linear and nonlinear non-GNN models in predicting gene expression values and expression-based gene rankings. Importantly, our GNN model requires ~10-fold less information than other models to achieve comparable performance. A strategic selection of input features, or incorporating an organ feature, from which the gene expression data are derived, further improves gene expression inference performance of the GNN model. Additionally, we evaluate the cross-platform generality of gene expression inference. Our study demonstrates that the transformation of RNA expression data into a graph structure effectively captures nonlinear correlations between genes, thereby enabling highly accurate and efficient prediction of gene expression profiles.</p>","PeriodicalId":9209,"journal":{"name":"Briefings in bioinformatics","volume":"26 3","pages":""},"PeriodicalIF":6.8000,"publicationDate":"2025-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12161499/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Briefings in bioinformatics","FirstCategoryId":"99","ListUrlMain":"https://doi.org/10.1093/bib/bbaf273","RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"BIOCHEMICAL RESEARCH METHODS","Score":null,"Total":0}
引用次数: 0
Abstract
Gene expression profiles can serve as proxies for cellular states and provide valuable insights into the discovery of functional connections across diverse cellular contexts. A cost-effective method called L1000 has been developed to generate gene expression profiles for over a million different conditions. Since gene expression inference of this method relies on linear regression, nonlinear regression methods, including deep learning models, have been assessed. However, these approaches process gene expression data as a vector structure, motivating us to investigate whether nonlinear models based on a graph structure are more effective in capturing the relationships between genes underlying gene expression profiles. In this work, we show that the graph neural network (GNN) model with genes as nodes outperforms both linear and nonlinear non-GNN models in predicting gene expression values and expression-based gene rankings. Importantly, our GNN model requires ~10-fold less information than other models to achieve comparable performance. A strategic selection of input features, or incorporating an organ feature, from which the gene expression data are derived, further improves gene expression inference performance of the GNN model. Additionally, we evaluate the cross-platform generality of gene expression inference. Our study demonstrates that the transformation of RNA expression data into a graph structure effectively captures nonlinear correlations between genes, thereby enabling highly accurate and efficient prediction of gene expression profiles.
期刊介绍:
Briefings in Bioinformatics is an international journal serving as a platform for researchers and educators in the life sciences. It also appeals to mathematicians, statisticians, and computer scientists applying their expertise to biological challenges. The journal focuses on reviews tailored for users of databases and analytical tools in contemporary genetics, molecular and systems biology. It stands out by offering practical assistance and guidance to non-specialists in computerized methodologies. Covering a wide range from introductory concepts to specific protocols and analyses, the papers address bacterial, plant, fungal, animal, and human data.
The journal's detailed subject areas include genetic studies of phenotypes and genotypes, mapping, DNA sequencing, expression profiling, gene expression studies, microarrays, alignment methods, protein profiles and HMMs, lipids, metabolic and signaling pathways, structure determination and function prediction, phylogenetic studies, and education and training.