Qing Zhang , Xiaoxiao Wu , Xiang Li , Wei Ma , Tongquan Wu , Liuyue Li , Fan Hu , Yicheng Xie , Xinglong Wu
{"title":"TransAnno-Net: A Deep Learning Framework for Accurate Cell Type Annotation of Mouse Lung Tissue Using Self-supervised Pretraining","authors":"Qing Zhang , Xiaoxiao Wu , Xiang Li , Wei Ma , Tongquan Wu , Liuyue Li , Fan Hu , Yicheng Xie , Xinglong Wu","doi":"10.1016/j.cmpb.2025.108809","DOIUrl":null,"url":null,"abstract":"<div><h3>Background</h3><div>Single-cell RNA sequencing (scRNA-seq) has become a significant tool for addressing complex issuess in the field of biology. In the context of scRNA-seq analysis, it is imperative to accurately determine the type of each cell. However, conventional supervised or semi-supervised methodologies are contingent on expert labels and incur substantial labeling costs, In contrast self-supervised pre-training strategies leverage unlabeled data during the pre-training phase and utilise a limited amount of labeled data in the fine-tuning phase, thereby greatly reducing labor costs. Furthermore, the fine-tuning does not need to learn the feature representations from scratch, enhancing the efficiency and transferability of the model.</div></div><div><h3>Methods</h3><div>The proposed methodology is outlined below. The deep learning framework, TransAnno-Net, is based on transfer learning and a Transformer architecture. It has been designed for efficient and accurate cell type annotations in large-scale scRNA-seq datasets of mouse lung organs. Specifically, TransAnno-Net is pre-trained on the scRNA-seq lung data of approximately 100,000 cells to acquire gene-gene similarities via self-supervised learning. It is then migrated to a relatively small number of datasets to fine-tune specific cell type annotation tasks. To address the issue of imbalance in cell types commonly observed in scRNA-seq data, we applied a random oversampling technique is applied to the fine-tuned dataset. This is done to mitigate the impact of distributional imbalance on the annotation outcomes.</div></div><div><h3>Results</h3><div>The experimental findings demonstrate that TransAnno-Net exhibits superior performance with an AUC of 0.979, 0.901, and 0.982, respectively, on three mouse lung datasets, outperforming eight state-of-the-art (SOTA) methods. In addition, TransAnno-Net demonstrates robust performance on cross-organ, cross-platform datasets, and is competitive with the fully supervised learning-based method.</div></div><div><h3>Conclusion</h3><div>The TransAnno-Net method is a highly effective cross-platform and cross-data set single-cell type annotation method for mouse lung tissues and supports cross-organ cell type annotation. This approach is expected to enhance the efficiency of research on the biological mechanisms of complex biological systems and diseases.</div></div>","PeriodicalId":10624,"journal":{"name":"Computer methods and programs in biomedicine","volume":"267 ","pages":"Article 108809"},"PeriodicalIF":4.9000,"publicationDate":"2025-04-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Computer methods and programs in biomedicine","FirstCategoryId":"5","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0169260725002263","RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS","Score":null,"Total":0}
引用次数: 0
Abstract
Background
Single-cell RNA sequencing (scRNA-seq) has become a significant tool for addressing complex issuess in the field of biology. In the context of scRNA-seq analysis, it is imperative to accurately determine the type of each cell. However, conventional supervised or semi-supervised methodologies are contingent on expert labels and incur substantial labeling costs, In contrast self-supervised pre-training strategies leverage unlabeled data during the pre-training phase and utilise a limited amount of labeled data in the fine-tuning phase, thereby greatly reducing labor costs. Furthermore, the fine-tuning does not need to learn the feature representations from scratch, enhancing the efficiency and transferability of the model.
Methods
The proposed methodology is outlined below. The deep learning framework, TransAnno-Net, is based on transfer learning and a Transformer architecture. It has been designed for efficient and accurate cell type annotations in large-scale scRNA-seq datasets of mouse lung organs. Specifically, TransAnno-Net is pre-trained on the scRNA-seq lung data of approximately 100,000 cells to acquire gene-gene similarities via self-supervised learning. It is then migrated to a relatively small number of datasets to fine-tune specific cell type annotation tasks. To address the issue of imbalance in cell types commonly observed in scRNA-seq data, we applied a random oversampling technique is applied to the fine-tuned dataset. This is done to mitigate the impact of distributional imbalance on the annotation outcomes.
Results
The experimental findings demonstrate that TransAnno-Net exhibits superior performance with an AUC of 0.979, 0.901, and 0.982, respectively, on three mouse lung datasets, outperforming eight state-of-the-art (SOTA) methods. In addition, TransAnno-Net demonstrates robust performance on cross-organ, cross-platform datasets, and is competitive with the fully supervised learning-based method.
Conclusion
The TransAnno-Net method is a highly effective cross-platform and cross-data set single-cell type annotation method for mouse lung tissues and supports cross-organ cell type annotation. This approach is expected to enhance the efficiency of research on the biological mechanisms of complex biological systems and diseases.
期刊介绍:
To encourage the development of formal computing methods, and their application in biomedical research and medical practice, by illustration of fundamental principles in biomedical informatics research; to stimulate basic research into application software design; to report the state of research of biomedical information processing projects; to report new computer methodologies applied in biomedical areas; the eventual distribution of demonstrable software to avoid duplication of effort; to provide a forum for discussion and improvement of existing software; to optimize contact between national organizations and regional user groups by promoting an international exchange of information on formal methods, standards and software in biomedicine.
Computer Methods and Programs in Biomedicine covers computing methodology and software systems derived from computing science for implementation in all aspects of biomedical research and medical practice. It is designed to serve: biochemists; biologists; geneticists; immunologists; neuroscientists; pharmacologists; toxicologists; clinicians; epidemiologists; psychiatrists; psychologists; cardiologists; chemists; (radio)physicists; computer scientists; programmers and systems analysts; biomedical, clinical, electrical and other engineers; teachers of medical informatics and users of educational software.