Inductive graph neural network framework for imputation of single-cell RNA sequencing data

IF 3.9 2区 工程技术 Q2 COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS
Boneshwar V K , Deepesh Agarwal , Bala Natarajan , Babji Srinivasan
{"title":"Inductive graph neural network framework for imputation of single-cell RNA sequencing data","authors":"Boneshwar V K ,&nbsp;Deepesh Agarwal ,&nbsp;Bala Natarajan ,&nbsp;Babji Srinivasan","doi":"10.1016/j.compchemeng.2025.109031","DOIUrl":null,"url":null,"abstract":"<div><div>Single-cell RNA sequencing (scRNA-seq) has transformed biological research, enabling detailed analysis of disease pathways, cellular differentiation, and immune responses at a cellular level. However, the noisy and sparse nature of scRNA-seq datasets often impedes accurate downstream analyses. Cell clustering and gene imputation serve as foundational tasks in harnessing scRNA-seq data for complex biological insights. While various graph-based methods have been developed to enhance imputation and clustering accuracy, traditional transductive models require entire graphs during training, limiting computational efficiency on large biological networks. This study introduces a novel inductive framework that efficiently learns relationships among graph nodes by utilizing subgraphs rather than full neighbor sets for node embedding generation, significantly reducing computational demands while maintaining robust performance. The proposed model achieves up to 60% improvement in Silhouette score, 14.9% in Adjusted Rand Index, 48% in runtime, and 4.5% in L<span><math><msub><mrow></mrow><mrow><mn>1</mn></mrow></msub></math></span> Median error over baseline models, validating the effectiveness of inductive graph learning. Evaluated on diverse scRNA-seq datasets—GSE75748 (progenitor cell types derived from human embryonic stem cells (hESCs)), GSE131928 (adult and pediatric IDH-wildtype glioblastomas (GBM)), and Goolam et al (blastomeres from early-stage Mus musculus (mouse) embryos collected at the 2-cell, 4-cell, 8-cell, 16-cell, and 32-cell stages of preimplantation development).—this framework demonstrates scalability and adaptability, offering a reliable approach for future applications in trajectory inference and gene pathway analysis.</div></div>","PeriodicalId":286,"journal":{"name":"Computers & Chemical Engineering","volume":"195 ","pages":"Article 109031"},"PeriodicalIF":3.9000,"publicationDate":"2025-02-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Computers & Chemical Engineering","FirstCategoryId":"5","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0098135425000353","RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS","Score":null,"Total":0}
引用次数: 0

Abstract

Single-cell RNA sequencing (scRNA-seq) has transformed biological research, enabling detailed analysis of disease pathways, cellular differentiation, and immune responses at a cellular level. However, the noisy and sparse nature of scRNA-seq datasets often impedes accurate downstream analyses. Cell clustering and gene imputation serve as foundational tasks in harnessing scRNA-seq data for complex biological insights. While various graph-based methods have been developed to enhance imputation and clustering accuracy, traditional transductive models require entire graphs during training, limiting computational efficiency on large biological networks. This study introduces a novel inductive framework that efficiently learns relationships among graph nodes by utilizing subgraphs rather than full neighbor sets for node embedding generation, significantly reducing computational demands while maintaining robust performance. The proposed model achieves up to 60% improvement in Silhouette score, 14.9% in Adjusted Rand Index, 48% in runtime, and 4.5% in L1 Median error over baseline models, validating the effectiveness of inductive graph learning. Evaluated on diverse scRNA-seq datasets—GSE75748 (progenitor cell types derived from human embryonic stem cells (hESCs)), GSE131928 (adult and pediatric IDH-wildtype glioblastomas (GBM)), and Goolam et al (blastomeres from early-stage Mus musculus (mouse) embryos collected at the 2-cell, 4-cell, 8-cell, 16-cell, and 32-cell stages of preimplantation development).—this framework demonstrates scalability and adaptability, offering a reliable approach for future applications in trajectory inference and gene pathway analysis.
求助全文
约1分钟内获得全文 求助全文
来源期刊
Computers & Chemical Engineering
Computers & Chemical Engineering 工程技术-工程:化工
CiteScore
8.70
自引率
14.00%
发文量
374
审稿时长
70 days
期刊介绍: Computers & Chemical Engineering is primarily a journal of record for new developments in the application of computing and systems technology to chemical engineering problems.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信