scGraPhT: Merging Transformers and Graph Neural Networks for Single-Cell Annotation

IF 3 3区计算机科学 Q2 ENGINEERING, ELECTRICAL & ELECTRONIC

IEEE Transactions on Signal and Information Processing over Networks Pub Date : 2025-03-26 DOI:10.1109/TSIPN.2025.3573591

Emirhan Koç;Emre Kulkul;Gülara Kaynar;Tolga Çukur;Murat Acar;Aykut Koç

{"title":"scGraPhT: Merging Transformers and Graph Neural Networks for Single-Cell Annotation","authors":"Emirhan Koç;Emre Kulkul;Gülara Kaynar;Tolga Çukur;Murat Acar;Aykut Koç","doi":"10.1109/TSIPN.2025.3573591","DOIUrl":null,"url":null,"abstract":"The invention of single-cell RNA sequencing (scRNA-seq) has enabled transcriptomic examination of cells on an individual basis, uncovering cell-to-cell phenotypic heterogeneity within isogenic cell populations. Inevitably, cell type annotation has emerged as a fundamental, albeit challenging task in scRNA-seq data analysis, which involves identifying and characterizing cells based on their unique molecular profiles. Recently, deep learning techniques with their data-driven priors have shown significant promise in this task. On the one hand, task-agnostic transformers pre-trained on large-scale biological databases capture generalizable representations but cannot characterize intricate relationships between genes and cells. Contrarily, task-specific graph neural networks (GNNs) trained on target datasets can characterize entity relationships, but they can suffer from poor generalizability. Furthermore, existing GNNs focus on either homogeneous or heterogeneous relationships, failing to capture the full cellular complexity. Here, we propose scGraPhT, a unified transformer–graph model that combines pre-trained transformer embeddings of scRNA-seq data with a multilayer GNN to capture cell-cell, cell-gene, and gene-gene relationships. Different from previous GNNs, scGraPhT examines both homogeneous and heterogeneous relationships through subgraph layers to offer a more comprehensive assessment. Since the graph construction uses transformer-derived embeddings, scGraPhT does not require costly training procedures and can also be adapted to leverage any transformer-based single-cell annotation method, such as scGPT or scBERT. Demonstrations on three scRNA-seq benchmark datasets indicate that scGraPhT outperforms state-of-the-art annotation methods without compromising efficiency. Utilizing Grad-CAM, we demonstrate how the GNN and transformer components complement each other to enhance performance. We share our source codes and datasets for reproducibility.","PeriodicalId":56268,"journal":{"name":"IEEE Transactions on Signal and Information Processing over Networks","volume":"11 ","pages":"505-519"},"PeriodicalIF":3.0000,"publicationDate":"2025-03-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Signal and Information Processing over Networks","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/11015257/","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"ENGINEERING, ELECTRICAL & ELECTRONIC","Score":null,"Total":0}

引用次数: 0

Abstract

The invention of single-cell RNA sequencing (scRNA-seq) has enabled transcriptomic examination of cells on an individual basis, uncovering cell-to-cell phenotypic heterogeneity within isogenic cell populations. Inevitably, cell type annotation has emerged as a fundamental, albeit challenging task in scRNA-seq data analysis, which involves identifying and characterizing cells based on their unique molecular profiles. Recently, deep learning techniques with their data-driven priors have shown significant promise in this task. On the one hand, task-agnostic transformers pre-trained on large-scale biological databases capture generalizable representations but cannot characterize intricate relationships between genes and cells. Contrarily, task-specific graph neural networks (GNNs) trained on target datasets can characterize entity relationships, but they can suffer from poor generalizability. Furthermore, existing GNNs focus on either homogeneous or heterogeneous relationships, failing to capture the full cellular complexity. Here, we propose scGraPhT, a unified transformer–graph model that combines pre-trained transformer embeddings of scRNA-seq data with a multilayer GNN to capture cell-cell, cell-gene, and gene-gene relationships. Different from previous GNNs, scGraPhT examines both homogeneous and heterogeneous relationships through subgraph layers to offer a more comprehensive assessment. Since the graph construction uses transformer-derived embeddings, scGraPhT does not require costly training procedures and can also be adapted to leverage any transformer-based single-cell annotation method, such as scGPT or scBERT. Demonstrations on three scRNA-seq benchmark datasets indicate that scGraPhT outperforms state-of-the-art annotation methods without compromising efficiency. Utilizing Grad-CAM, we demonstrate how the GNN and transformer components complement each other to enhance performance. We share our source codes and datasets for reproducibility.

查看原文本刊更多论文

用于单细胞注释的合并变压器和图神经网络

单细胞RNA测序（scRNA-seq）的发明使得能够在个体基础上对细胞进行转录组学检查，揭示等基因细胞群体中细胞间表型异质性。不可避免地，细胞类型注释已经成为scRNA-seq数据分析中的一项基本任务，尽管具有挑战性，它涉及基于其独特的分子谱识别和表征细胞。最近，具有数据驱动先验的深度学习技术在这项任务中显示出了巨大的希望。一方面，在大规模生物数据库上预先训练的任务不可知变形器捕获了可概括的表征，但不能表征基因和细胞之间的复杂关系。相反，在目标数据集上训练的任务特定图神经网络（gnn）可以表征实体关系，但它们的泛化能力较差。此外，现有的gnn关注同质或异质关系，未能捕获完整的细胞复杂性。在这里，我们提出了scGraPhT，这是一个统一的变压器图模型，它将scRNA-seq数据的预训练变压器嵌入与多层GNN相结合，以捕获细胞-细胞、细胞-基因和基因-基因之间的关系。与以前的gnn不同，scGraPhT通过子图层检查同质和异构关系，以提供更全面的评估。由于图的构造使用了变压器派生的嵌入，所以scGraPhT不需要昂贵的训练过程，而且还可以适应利用任何基于变压器的单单元注释方法，比如scGPT或scBERT。在三个scRNA-seq基准数据集上的演示表明，scGraPhT在不影响效率的情况下优于最先进的注释方法。利用Grad-CAM，我们演示了GNN和变压器组件如何相互补充以提高性能。为了再现性，我们共享源代码和数据集。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

IEEE Transactions on Signal and Information Processing over Networks Computer Science-Computer Networks and Communications

CiteScore

5.80

自引率

12.50%

发文量

期刊介绍： The IEEE Transactions on Signal and Information Processing over Networks publishes high-quality papers that extend the classical notions of processing of signals defined over vector spaces (e.g. time and space) to processing of signals and information (data) defined over networks, potentially dynamically varying. In signal processing over networks, the topology of the network may define structural relationships in the data, or may constrain processing of the data. Topics include distributed algorithms for filtering, detection, estimation, adaptation and learning, model selection, data fusion, and diffusion or evolution of information over such networks, and applications of distributed signal processing.