scGraPhT: Merging Transformers and Graph Neural Networks for Single-Cell Annotation

IF 3 3区 计算机科学 Q2 ENGINEERING, ELECTRICAL & ELECTRONIC
Emirhan Koç;Emre Kulkul;Gülara Kaynar;Tolga Çukur;Murat Acar;Aykut Koç
{"title":"scGraPhT: Merging Transformers and Graph Neural Networks for Single-Cell Annotation","authors":"Emirhan Koç;Emre Kulkul;Gülara Kaynar;Tolga Çukur;Murat Acar;Aykut Koç","doi":"10.1109/TSIPN.2025.3573591","DOIUrl":null,"url":null,"abstract":"The invention of single-cell RNA sequencing (scRNA-seq) has enabled transcriptomic examination of cells on an individual basis, uncovering cell-to-cell phenotypic heterogeneity within isogenic cell populations. Inevitably, cell type annotation has emerged as a fundamental, albeit challenging task in scRNA-seq data analysis, which involves identifying and characterizing cells based on their unique molecular profiles. Recently, deep learning techniques with their data-driven priors have shown significant promise in this task. On the one hand, task-agnostic transformers pre-trained on large-scale biological databases capture generalizable representations but cannot characterize intricate relationships between genes and cells. Contrarily, task-specific graph neural networks (GNNs) trained on target datasets can characterize entity relationships, but they can suffer from poor generalizability. Furthermore, existing GNNs focus on either homogeneous or heterogeneous relationships, failing to capture the full cellular complexity. Here, we propose scGraPhT, a unified transformer–graph model that combines pre-trained transformer embeddings of scRNA-seq data with a multilayer GNN to capture cell-cell, cell-gene, and gene-gene relationships. Different from previous GNNs, scGraPhT examines both homogeneous and heterogeneous relationships through subgraph layers to offer a more comprehensive assessment. Since the graph construction uses transformer-derived embeddings, scGraPhT does not require costly training procedures and can also be adapted to leverage any transformer-based single-cell annotation method, such as scGPT or scBERT. Demonstrations on three scRNA-seq benchmark datasets indicate that scGraPhT outperforms state-of-the-art annotation methods without compromising efficiency. Utilizing Grad-CAM, we demonstrate how the GNN and transformer components complement each other to enhance performance. We share our source codes and datasets for reproducibility.","PeriodicalId":56268,"journal":{"name":"IEEE Transactions on Signal and Information Processing over Networks","volume":"11 ","pages":"505-519"},"PeriodicalIF":3.0000,"publicationDate":"2025-03-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Signal and Information Processing over Networks","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/11015257/","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"ENGINEERING, ELECTRICAL & ELECTRONIC","Score":null,"Total":0}
引用次数: 0

Abstract

The invention of single-cell RNA sequencing (scRNA-seq) has enabled transcriptomic examination of cells on an individual basis, uncovering cell-to-cell phenotypic heterogeneity within isogenic cell populations. Inevitably, cell type annotation has emerged as a fundamental, albeit challenging task in scRNA-seq data analysis, which involves identifying and characterizing cells based on their unique molecular profiles. Recently, deep learning techniques with their data-driven priors have shown significant promise in this task. On the one hand, task-agnostic transformers pre-trained on large-scale biological databases capture generalizable representations but cannot characterize intricate relationships between genes and cells. Contrarily, task-specific graph neural networks (GNNs) trained on target datasets can characterize entity relationships, but they can suffer from poor generalizability. Furthermore, existing GNNs focus on either homogeneous or heterogeneous relationships, failing to capture the full cellular complexity. Here, we propose scGraPhT, a unified transformer–graph model that combines pre-trained transformer embeddings of scRNA-seq data with a multilayer GNN to capture cell-cell, cell-gene, and gene-gene relationships. Different from previous GNNs, scGraPhT examines both homogeneous and heterogeneous relationships through subgraph layers to offer a more comprehensive assessment. Since the graph construction uses transformer-derived embeddings, scGraPhT does not require costly training procedures and can also be adapted to leverage any transformer-based single-cell annotation method, such as scGPT or scBERT. Demonstrations on three scRNA-seq benchmark datasets indicate that scGraPhT outperforms state-of-the-art annotation methods without compromising efficiency. Utilizing Grad-CAM, we demonstrate how the GNN and transformer components complement each other to enhance performance. We share our source codes and datasets for reproducibility.
用于单细胞注释的合并变压器和图神经网络
单细胞RNA测序(scRNA-seq)的发明使得能够在个体基础上对细胞进行转录组学检查,揭示等基因细胞群体中细胞间表型异质性。不可避免地,细胞类型注释已经成为scRNA-seq数据分析中的一项基本任务,尽管具有挑战性,它涉及基于其独特的分子谱识别和表征细胞。最近,具有数据驱动先验的深度学习技术在这项任务中显示出了巨大的希望。一方面,在大规模生物数据库上预先训练的任务不可知变形器捕获了可概括的表征,但不能表征基因和细胞之间的复杂关系。相反,在目标数据集上训练的任务特定图神经网络(gnn)可以表征实体关系,但它们的泛化能力较差。此外,现有的gnn关注同质或异质关系,未能捕获完整的细胞复杂性。在这里,我们提出了scGraPhT,这是一个统一的变压器图模型,它将scRNA-seq数据的预训练变压器嵌入与多层GNN相结合,以捕获细胞-细胞、细胞-基因和基因-基因之间的关系。与以前的gnn不同,scGraPhT通过子图层检查同质和异构关系,以提供更全面的评估。由于图的构造使用了变压器派生的嵌入,所以scGraPhT不需要昂贵的训练过程,而且还可以适应利用任何基于变压器的单单元注释方法,比如scGPT或scBERT。在三个scRNA-seq基准数据集上的演示表明,scGraPhT在不影响效率的情况下优于最先进的注释方法。利用Grad-CAM,我们演示了GNN和变压器组件如何相互补充以提高性能。为了再现性,我们共享源代码和数据集。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
IEEE Transactions on Signal and Information Processing over Networks
IEEE Transactions on Signal and Information Processing over Networks Computer Science-Computer Networks and Communications
CiteScore
5.80
自引率
12.50%
发文量
56
期刊介绍: The IEEE Transactions on Signal and Information Processing over Networks publishes high-quality papers that extend the classical notions of processing of signals defined over vector spaces (e.g. time and space) to processing of signals and information (data) defined over networks, potentially dynamically varying. In signal processing over networks, the topology of the network may define structural relationships in the data, or may constrain processing of the data. Topics include distributed algorithms for filtering, detection, estimation, adaptation and learning, model selection, data fusion, and diffusion or evolution of information over such networks, and applications of distributed signal processing.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信