Yan Liu , Chen Li , Long-Chen Shen , He Yan , Guo Wei , Robin B. Gasser , Xiaohua Hu , Jiangning Song , Dong-Jun Yu
{"title":"scRCA: A Siamese network-based pipeline for annotating cell types using noisy single-cell RNA-seq reference data","authors":"Yan Liu , Chen Li , Long-Chen Shen , He Yan , Guo Wei , Robin B. Gasser , Xiaohua Hu , Jiangning Song , Dong-Jun Yu","doi":"10.1016/j.compbiomed.2025.110068","DOIUrl":null,"url":null,"abstract":"<div><div>Accurate cell type annotation is fundamentally critical for single-cell sequencing (scRNA-seq) data analysis to provide insightful knowledge of tissue-specific cell heterogeneity and cell state transition tracking. Cell type annotation is usually conducted by comparative analysis with known data (i.e., reference) – which contains a presumably accurate representation of cell types. However, this assumption is often problematic, as factors such as human errors in wet-lab experiments and methodological limitations can introduce annotation errors in the reference dataset. As current pipelines for single-cell transcriptomic analysis do not adequately consider this challenge, there is a major demand for constructing a computational pipeline that achieves high-quality cell type annotation using reference datasets containing inherent errors (referred to as “noise” in this study). Here, we built a Siamese network-based pipeline, termed scRCA, to accurately annotate cell types based on noisy reference data. To help users evaluate the reliability of scRCA annotations, an interpreter was also developed to explore the factors underlying the model's predictions. Our experiments demonstrate that, across 14 datasets, scRCA outperformed other widely adopted reference-based methods for cell type annotation. Using an independent dataset of four multiple myeloma patients, we further illustrated that scRCA can distinguish cancerous cells based on gene expression levels and identify genes closely associated with multiple myeloma through scRCA's interpretable module, providing significant information for subsequent clinical treatments. With these advancements, we anticipate that scRCA will serve as a practical reference-based approach for accurate annotating cell type annotation.</div></div>","PeriodicalId":10578,"journal":{"name":"Computers in biology and medicine","volume":"190 ","pages":"Article 110068"},"PeriodicalIF":7.0000,"publicationDate":"2025-03-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Computers in biology and medicine","FirstCategoryId":"5","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0010482525004196","RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"BIOLOGY","Score":null,"Total":0}
引用次数: 0
Abstract
Accurate cell type annotation is fundamentally critical for single-cell sequencing (scRNA-seq) data analysis to provide insightful knowledge of tissue-specific cell heterogeneity and cell state transition tracking. Cell type annotation is usually conducted by comparative analysis with known data (i.e., reference) – which contains a presumably accurate representation of cell types. However, this assumption is often problematic, as factors such as human errors in wet-lab experiments and methodological limitations can introduce annotation errors in the reference dataset. As current pipelines for single-cell transcriptomic analysis do not adequately consider this challenge, there is a major demand for constructing a computational pipeline that achieves high-quality cell type annotation using reference datasets containing inherent errors (referred to as “noise” in this study). Here, we built a Siamese network-based pipeline, termed scRCA, to accurately annotate cell types based on noisy reference data. To help users evaluate the reliability of scRCA annotations, an interpreter was also developed to explore the factors underlying the model's predictions. Our experiments demonstrate that, across 14 datasets, scRCA outperformed other widely adopted reference-based methods for cell type annotation. Using an independent dataset of four multiple myeloma patients, we further illustrated that scRCA can distinguish cancerous cells based on gene expression levels and identify genes closely associated with multiple myeloma through scRCA's interpretable module, providing significant information for subsequent clinical treatments. With these advancements, we anticipate that scRCA will serve as a practical reference-based approach for accurate annotating cell type annotation.
期刊介绍:
Computers in Biology and Medicine is an international forum for sharing groundbreaking advancements in the use of computers in bioscience and medicine. This journal serves as a medium for communicating essential research, instruction, ideas, and information regarding the rapidly evolving field of computer applications in these domains. By encouraging the exchange of knowledge, we aim to facilitate progress and innovation in the utilization of computers in biology and medicine.