{"title":"CanCellCap: robust cancer cell capture across tissue types on single-cell RNA-seq data by multi-domain learning.","authors":"Jiaxing Bai, Yichun Gao, Feng Zhou, Yushuang He, Chen Lin, Xiaobing Huang, Ying Wang","doi":"10.1186/s12915-025-02337-1","DOIUrl":null,"url":null,"abstract":"<p><strong>Background: </strong>The advent of single-cell RNA sequencing (scRNA-seq) has provided unprecedented insights into cancer cellular diversity, enabling a comprehensive understanding of cancer at the single-cell level. However, identifying cancer cells remains challenging due to gene expression variability caused by tumor or tissue heterogeneity, which negatively impacts generalization and robustness.</p><p><strong>Results: </strong>We propose CanCellCap, a multi-domain learning framework, to identify cancer cells in scRNA-seq data suitable for all tissues, cancers, and sequencing platforms. Integrating domain adversarial learning and Mixture of Experts, CanCellCap is able to simultaneously extract common and specific patterns in gene expression profiles across different tissues for cancer or normal cells. Moreover, the masking-reconstruction strategy enables CanCellCap to cope with scRNA-seq data from different sequencing platforms. CanCellCap achieves 0.977 average accuracy in cancer cell identification across 13 tissue types, 23 cancer types, and 7 sequencing platforms. It outperforms five state-of-the-art methods on 33 benchmark datasets. Notably, CanCellCap maintains high performance on unseen cancer types, tissue types, and even across species, highlighting its effectiveness in challenging scenarios. It also excels in spatial transcriptomics by accurately identifying cancer spots. Furthermore, CanCellCap demonstrates strong computational efficiency, completing inference on 100,000 cells in a few minutes. In addition, interpretability analyses reveal critical biomarkers and pathways, offering valuable biological insights.</p><p><strong>Conclusions: </strong>CanCellCap provides a robust and accurate framework for identifying cancer cells across diverse platforms, tissue types, and cancer types. Its strong generalization to unseen cancers, tissues, and even species, combined with its adaptability to spatial transcriptomics data, underscores its versatility for both research and clinical applications.</p>","PeriodicalId":9339,"journal":{"name":"BMC Biology","volume":"23 1","pages":"230"},"PeriodicalIF":4.5000,"publicationDate":"2025-07-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12312500/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"BMC Biology","FirstCategoryId":"99","ListUrlMain":"https://doi.org/10.1186/s12915-025-02337-1","RegionNum":1,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"BIOLOGY","Score":null,"Total":0}
引用次数: 0
Abstract
Background: The advent of single-cell RNA sequencing (scRNA-seq) has provided unprecedented insights into cancer cellular diversity, enabling a comprehensive understanding of cancer at the single-cell level. However, identifying cancer cells remains challenging due to gene expression variability caused by tumor or tissue heterogeneity, which negatively impacts generalization and robustness.
Results: We propose CanCellCap, a multi-domain learning framework, to identify cancer cells in scRNA-seq data suitable for all tissues, cancers, and sequencing platforms. Integrating domain adversarial learning and Mixture of Experts, CanCellCap is able to simultaneously extract common and specific patterns in gene expression profiles across different tissues for cancer or normal cells. Moreover, the masking-reconstruction strategy enables CanCellCap to cope with scRNA-seq data from different sequencing platforms. CanCellCap achieves 0.977 average accuracy in cancer cell identification across 13 tissue types, 23 cancer types, and 7 sequencing platforms. It outperforms five state-of-the-art methods on 33 benchmark datasets. Notably, CanCellCap maintains high performance on unseen cancer types, tissue types, and even across species, highlighting its effectiveness in challenging scenarios. It also excels in spatial transcriptomics by accurately identifying cancer spots. Furthermore, CanCellCap demonstrates strong computational efficiency, completing inference on 100,000 cells in a few minutes. In addition, interpretability analyses reveal critical biomarkers and pathways, offering valuable biological insights.
Conclusions: CanCellCap provides a robust and accurate framework for identifying cancer cells across diverse platforms, tissue types, and cancer types. Its strong generalization to unseen cancers, tissues, and even species, combined with its adaptability to spatial transcriptomics data, underscores its versatility for both research and clinical applications.
期刊介绍:
BMC Biology is a broad scope journal covering all areas of biology. Our content includes research articles, new methods and tools. BMC Biology also publishes reviews, Q&A, and commentaries.