TransAnno-Net: A Deep Learning Framework for Accurate Cell Type Annotation of Mouse Lung Tissue Using Self-supervised Pretraining

IF 4.9 2区 医学 Q1 COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS
Qing Zhang , Xiaoxiao Wu , Xiang Li , Wei Ma , Tongquan Wu , Liuyue Li , Fan Hu , Yicheng Xie , Xinglong Wu
{"title":"TransAnno-Net: A Deep Learning Framework for Accurate Cell Type Annotation of Mouse Lung Tissue Using Self-supervised Pretraining","authors":"Qing Zhang ,&nbsp;Xiaoxiao Wu ,&nbsp;Xiang Li ,&nbsp;Wei Ma ,&nbsp;Tongquan Wu ,&nbsp;Liuyue Li ,&nbsp;Fan Hu ,&nbsp;Yicheng Xie ,&nbsp;Xinglong Wu","doi":"10.1016/j.cmpb.2025.108809","DOIUrl":null,"url":null,"abstract":"<div><h3>Background</h3><div>Single-cell RNA sequencing (scRNA-seq) has become a significant tool for addressing complex issuess in the field of biology. In the context of scRNA-seq analysis, it is imperative to accurately determine the type of each cell. However, conventional supervised or semi-supervised methodologies are contingent on expert labels and incur substantial labeling costs, In contrast self-supervised pre-training strategies leverage unlabeled data during the pre-training phase and utilise a limited amount of labeled data in the fine-tuning phase, thereby greatly reducing labor costs. Furthermore, the fine-tuning does not need to learn the feature representations from scratch, enhancing the efficiency and transferability of the model.</div></div><div><h3>Methods</h3><div>The proposed methodology is outlined below. The deep learning framework, TransAnno-Net, is based on transfer learning and a Transformer architecture. It has been designed for efficient and accurate cell type annotations in large-scale scRNA-seq datasets of mouse lung organs. Specifically, TransAnno-Net is pre-trained on the scRNA-seq lung data of approximately 100,000 cells to acquire gene-gene similarities via self-supervised learning. It is then migrated to a relatively small number of datasets to fine-tune specific cell type annotation tasks. To address the issue of imbalance in cell types commonly observed in scRNA-seq data, we applied a random oversampling technique is applied to the fine-tuned dataset. This is done to mitigate the impact of distributional imbalance on the annotation outcomes.</div></div><div><h3>Results</h3><div>The experimental findings demonstrate that TransAnno-Net exhibits superior performance with an AUC of 0.979, 0.901, and 0.982, respectively, on three mouse lung datasets, outperforming eight state-of-the-art (SOTA) methods. In addition, TransAnno-Net demonstrates robust performance on cross-organ, cross-platform datasets, and is competitive with the fully supervised learning-based method.</div></div><div><h3>Conclusion</h3><div>The TransAnno-Net method is a highly effective cross-platform and cross-data set single-cell type annotation method for mouse lung tissues and supports cross-organ cell type annotation. This approach is expected to enhance the efficiency of research on the biological mechanisms of complex biological systems and diseases.</div></div>","PeriodicalId":10624,"journal":{"name":"Computer methods and programs in biomedicine","volume":"267 ","pages":"Article 108809"},"PeriodicalIF":4.9000,"publicationDate":"2025-04-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Computer methods and programs in biomedicine","FirstCategoryId":"5","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0169260725002263","RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS","Score":null,"Total":0}
引用次数: 0

Abstract

Background

Single-cell RNA sequencing (scRNA-seq) has become a significant tool for addressing complex issuess in the field of biology. In the context of scRNA-seq analysis, it is imperative to accurately determine the type of each cell. However, conventional supervised or semi-supervised methodologies are contingent on expert labels and incur substantial labeling costs, In contrast self-supervised pre-training strategies leverage unlabeled data during the pre-training phase and utilise a limited amount of labeled data in the fine-tuning phase, thereby greatly reducing labor costs. Furthermore, the fine-tuning does not need to learn the feature representations from scratch, enhancing the efficiency and transferability of the model.

Methods

The proposed methodology is outlined below. The deep learning framework, TransAnno-Net, is based on transfer learning and a Transformer architecture. It has been designed for efficient and accurate cell type annotations in large-scale scRNA-seq datasets of mouse lung organs. Specifically, TransAnno-Net is pre-trained on the scRNA-seq lung data of approximately 100,000 cells to acquire gene-gene similarities via self-supervised learning. It is then migrated to a relatively small number of datasets to fine-tune specific cell type annotation tasks. To address the issue of imbalance in cell types commonly observed in scRNA-seq data, we applied a random oversampling technique is applied to the fine-tuned dataset. This is done to mitigate the impact of distributional imbalance on the annotation outcomes.

Results

The experimental findings demonstrate that TransAnno-Net exhibits superior performance with an AUC of 0.979, 0.901, and 0.982, respectively, on three mouse lung datasets, outperforming eight state-of-the-art (SOTA) methods. In addition, TransAnno-Net demonstrates robust performance on cross-organ, cross-platform datasets, and is competitive with the fully supervised learning-based method.

Conclusion

The TransAnno-Net method is a highly effective cross-platform and cross-data set single-cell type annotation method for mouse lung tissues and supports cross-organ cell type annotation. This approach is expected to enhance the efficiency of research on the biological mechanisms of complex biological systems and diseases.
transno - net:一个使用自我监督预训练的小鼠肺组织精确细胞类型注释的深度学习框架
单细胞RNA测序(scRNA-seq)已成为解决生物学领域复杂问题的重要工具。在scRNA-seq分析的背景下,准确确定每个细胞的类型是非常必要的。然而,传统的监督或半监督方法依赖于专家标签,并产生大量的标签成本。相比之下,自监督预训练策略在预训练阶段利用未标记的数据,并在微调阶段利用有限数量的标记数据,从而大大降低了人工成本。此外,微调不需要从头学习特征表示,提高了模型的效率和可移植性。建议的方法概述如下。深度学习框架transno - net基于迁移学习和Transformer架构。它被设计用于对小鼠肺器官的大规模scRNA-seq数据集进行高效和准确的细胞类型注释。具体来说,transano - net在大约10万个细胞的scRNA-seq肺数据上进行预训练,通过自我监督学习获得基因基因相似性。然后将其迁移到相对较少的数据集,以微调特定的单元格类型注释任务。为了解决scRNA-seq数据中常见的细胞类型不平衡问题,我们将随机过采样技术应用于微调数据集。这样做是为了减轻分布不平衡对注释结果的影响。结果transno - net在3个小鼠肺数据集上的AUC分别为0.979、0.901和0.982,优于8种最先进的SOTA方法。此外,transno - net在跨器官、跨平台数据集上表现出强大的性能,与基于完全监督学习的方法具有竞争力。结论transno - net方法是一种高效的跨平台、跨数据集的小鼠肺组织单细胞类型标注方法,支持跨器官细胞类型标注。这种方法有望提高复杂生物系统和疾病的生物学机制研究的效率。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
Computer methods and programs in biomedicine
Computer methods and programs in biomedicine 工程技术-工程:生物医学
CiteScore
12.30
自引率
6.60%
发文量
601
审稿时长
135 days
期刊介绍: To encourage the development of formal computing methods, and their application in biomedical research and medical practice, by illustration of fundamental principles in biomedical informatics research; to stimulate basic research into application software design; to report the state of research of biomedical information processing projects; to report new computer methodologies applied in biomedical areas; the eventual distribution of demonstrable software to avoid duplication of effort; to provide a forum for discussion and improvement of existing software; to optimize contact between national organizations and regional user groups by promoting an international exchange of information on formal methods, standards and software in biomedicine. Computer Methods and Programs in Biomedicine covers computing methodology and software systems derived from computing science for implementation in all aspects of biomedical research and medical practice. It is designed to serve: biochemists; biologists; geneticists; immunologists; neuroscientists; pharmacologists; toxicologists; clinicians; epidemiologists; psychiatrists; psychologists; cardiologists; chemists; (radio)physicists; computer scientists; programmers and systems analysts; biomedical, clinical, electrical and other engineers; teachers of medical informatics and users of educational software.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信