{"title":"scE2EGAE: enhancing single-cell RNA-Seq data analysis through an end-to-end cell-graph-learnable graph autoencoder with differentiable edge sampling.","authors":"Shuo Wang, Yuanning Liu, Hao Zhang, Zhen Liu","doi":"10.1186/s13062-025-00616-z","DOIUrl":null,"url":null,"abstract":"<p><strong>Background: </strong>Single-cell RNA sequencing (scRNA-Seq) technology reveals biological processes and molecular-level genomic information among individual cells. Numerous computational methods, including methods based on graph neural networks (GNNs), have been developed to enhance scRNA-Seq data analysis. However, existing GNNs-based methods usually construct fixed graphs by applying the k-nearest neighbors algorithm, which may result in information loss.</p><p><strong>Methods: </strong>To address this problem, we propose scE2EGAE, which learns cell graphs during the training processes. Firstly, the scRNA-Seq data is fed into a deep count autoencoder (DCA). Secondly, the hidden representations of DCA are extracted and then used to generate cell-to-cell graph edges through a straight-through estimator (STE) based on top-k sampling and Gumbel-Softmax. Finally, the generated cell-to-cell graph and scRNA-Seq data are fed into the GNNs-based downstream tasks. In this paper, we design a graph autoencoder which performs denoising on scRNA-Seq data as the downstream task.</p><p><strong>Results: </strong>We evaluate scE2EGAE on eight public scRNA-Seq datasets and compare its performance with seven existing scRNA-Seq data denoising methods. In this paper, extensive experiments are conducted, encompassing: 1) the evaluation of denoising performance, with metrics including mean absolute error, Pearson correlation coefficient, and cosine similarity; 2) the assessment of clustering performance of the denoised results, utilizing adjusted rand index, normalized mutual information and silhouette score; and 3) the evaluation of the cell trajectory inference performance of the denoised results, measured by the pseudo-temporal ordering score. The results show that, on the scRNA-Seq data denoising task, scE2EGAE outperforms most of the methods, proving that it can learn cell-to-cell graphs containing real information of cell-to-cell relationships.</p><p><strong>Conclusions: </strong>In this paper, we validate the proposed scE2EGAE method through its application to the denoising task of scRNA-Seq data. This method demonstrates its capability to learn inter-cellular relationships and construct cell-to-cell graphs, thereby enhancing the downstream analysis of scRNA-Seq data. Our approach can serve as an inspiration for future research on scRNA-Seq analysis methods based on GNNs, holding broad application prospects.</p>","PeriodicalId":9164,"journal":{"name":"Biology Direct","volume":"20 1","pages":"66"},"PeriodicalIF":5.7000,"publicationDate":"2025-05-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12108024/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Biology Direct","FirstCategoryId":"99","ListUrlMain":"https://doi.org/10.1186/s13062-025-00616-z","RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"BIOLOGY","Score":null,"Total":0}
引用次数: 0
Abstract
Background: Single-cell RNA sequencing (scRNA-Seq) technology reveals biological processes and molecular-level genomic information among individual cells. Numerous computational methods, including methods based on graph neural networks (GNNs), have been developed to enhance scRNA-Seq data analysis. However, existing GNNs-based methods usually construct fixed graphs by applying the k-nearest neighbors algorithm, which may result in information loss.
Methods: To address this problem, we propose scE2EGAE, which learns cell graphs during the training processes. Firstly, the scRNA-Seq data is fed into a deep count autoencoder (DCA). Secondly, the hidden representations of DCA are extracted and then used to generate cell-to-cell graph edges through a straight-through estimator (STE) based on top-k sampling and Gumbel-Softmax. Finally, the generated cell-to-cell graph and scRNA-Seq data are fed into the GNNs-based downstream tasks. In this paper, we design a graph autoencoder which performs denoising on scRNA-Seq data as the downstream task.
Results: We evaluate scE2EGAE on eight public scRNA-Seq datasets and compare its performance with seven existing scRNA-Seq data denoising methods. In this paper, extensive experiments are conducted, encompassing: 1) the evaluation of denoising performance, with metrics including mean absolute error, Pearson correlation coefficient, and cosine similarity; 2) the assessment of clustering performance of the denoised results, utilizing adjusted rand index, normalized mutual information and silhouette score; and 3) the evaluation of the cell trajectory inference performance of the denoised results, measured by the pseudo-temporal ordering score. The results show that, on the scRNA-Seq data denoising task, scE2EGAE outperforms most of the methods, proving that it can learn cell-to-cell graphs containing real information of cell-to-cell relationships.
Conclusions: In this paper, we validate the proposed scE2EGAE method through its application to the denoising task of scRNA-Seq data. This method demonstrates its capability to learn inter-cellular relationships and construct cell-to-cell graphs, thereby enhancing the downstream analysis of scRNA-Seq data. Our approach can serve as an inspiration for future research on scRNA-Seq analysis methods based on GNNs, holding broad application prospects.
期刊介绍:
Biology Direct serves the life science research community as an open access, peer-reviewed online journal, providing authors and readers with an alternative to the traditional model of peer review. Biology Direct considers original research articles, hypotheses, comments, discovery notes and reviews in subject areas currently identified as those most conducive to the open review approach, primarily those with a significant non-experimental component.