评估用于标注 scRNA-seq 数据的预训练语言模型的参数效率方法

IF 4.2 3区生物学 Q1 BIOCHEMICAL RESEARCH METHODS

Methods Pub Date : 2024-05-15 DOI:10.1016/j.ymeth.2024.05.007

Yucheng Xia , Yuhang Liu , Tianhao Li , Sihan He , Hong Chang , Yaqing Wang , Yongqing Zhang , Wenyi Ge

{"title":"评估用于标注 scRNA-seq 数据的预训练语言模型的参数效率方法","authors":"Yucheng Xia , Yuhang Liu , Tianhao Li , Sihan He , Hong Chang , Yaqing Wang , Yongqing Zhang , Wenyi Ge","doi":"10.1016/j.ymeth.2024.05.007","DOIUrl":null,"url":null,"abstract":"<div><p>Annotating cell types of single-cell RNA sequencing (scRNA-seq) data is crucial for studying cellular heterogeneity in the tumor microenvironment. Recently, large-scale pre-trained language models (PLMs) have achieved significant progress in cell-type annotation of scRNA-seq data. This approach effectively addresses previous methods' shortcomings in performance and generalization. However, fine-tuning PLMs for different downstream tasks demands considerable computational resources, rendering it impractical. Hence, a new research branch introduces parameter-efficient fine-tuning (PEFT). This involves optimizing a few parameters while leaving the majority unchanged, leading to substantial reductions in computational expenses. Here, we utilize scBERT, a large-scale pre-trained model, to explore the capabilities of three PEFT methods in scRNA-seq cell type annotation. Extensive benchmark studies across several datasets demonstrate the superior applicability of PEFT methods. Furthermore, downstream analysis using models obtained through PEFT showcases their utility in novel cell type discovery and model interpretability for potential marker genes. Our findings underscore the considerable potential of PEFT in PLM-based cell type annotation, presenting novel perspectives for the analysis of scRNA-seq data.</p></div>","PeriodicalId":390,"journal":{"name":"Methods","volume":"228 ","pages":"Pages 12-21"},"PeriodicalIF":4.2000,"publicationDate":"2024-05-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Assessing parameter efficient methods for pre-trained language model in annotating scRNA-seq data\",\"authors\":\"Yucheng Xia , Yuhang Liu , Tianhao Li , Sihan He , Hong Chang , Yaqing Wang , Yongqing Zhang , Wenyi Ge\",\"doi\":\"10.1016/j.ymeth.2024.05.007\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><p>Annotating cell types of single-cell RNA sequencing (scRNA-seq) data is crucial for studying cellular heterogeneity in the tumor microenvironment. Recently, large-scale pre-trained language models (PLMs) have achieved significant progress in cell-type annotation of scRNA-seq data. This approach effectively addresses previous methods' shortcomings in performance and generalization. However, fine-tuning PLMs for different downstream tasks demands considerable computational resources, rendering it impractical. Hence, a new research branch introduces parameter-efficient fine-tuning (PEFT). This involves optimizing a few parameters while leaving the majority unchanged, leading to substantial reductions in computational expenses. Here, we utilize scBERT, a large-scale pre-trained model, to explore the capabilities of three PEFT methods in scRNA-seq cell type annotation. Extensive benchmark studies across several datasets demonstrate the superior applicability of PEFT methods. Furthermore, downstream analysis using models obtained through PEFT showcases their utility in novel cell type discovery and model interpretability for potential marker genes. Our findings underscore the considerable potential of PEFT in PLM-based cell type annotation, presenting novel perspectives for the analysis of scRNA-seq data.</p></div>\",\"PeriodicalId\":390,\"journal\":{\"name\":\"Methods\",\"volume\":\"228 \",\"pages\":\"Pages 12-21\"},\"PeriodicalIF\":4.2000,\"publicationDate\":\"2024-05-15\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Methods\",\"FirstCategoryId\":\"99\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S1046202324001233\",\"RegionNum\":3,\"RegionCategory\":\"生物学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"BIOCHEMICAL RESEARCH METHODS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Methods","FirstCategoryId":"99","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S1046202324001233","RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"BIOCHEMICAL RESEARCH METHODS","Score":null,"Total":0}

引用次数: 0

摘要

注释单细胞 RNA 测序（scRNA-seq）数据的细胞类型对于研究肿瘤微环境中的细胞异质性至关重要。最近，大规模预训练语言模型（PLM）在 scRNA-seq 数据的细胞类型注释方面取得了重大进展。这种方法有效解决了以往方法在性能和泛化方面的不足。然而，针对不同的下游任务对 PLM 进行微调需要大量的计算资源，因此并不现实。因此，一个新的研究分支引入了参数效率微调（PEFT）。这包括优化少数参数，而大部分参数保持不变，从而大幅降低计算费用。在这里，我们利用大规模预训练模型 scBERT 探索了三种 PEFT 方法在 scRNA-seq 细胞类型标注中的能力。对多个数据集进行的广泛基准研究证明了 PEFT 方法的优越适用性。此外，利用 PEFT 方法获得的模型进行的下游分析表明了这些方法在新型细胞类型发现中的实用性以及潜在标记基因模型的可解释性。我们的研究结果强调了 PEFT 在基于 PLM 的细胞类型注释中的巨大潜力，为 scRNA-seq 数据分析提供了新的视角。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Assessing parameter efficient methods for pre-trained language model in annotating scRNA-seq data

Annotating cell types of single-cell RNA sequencing (scRNA-seq) data is crucial for studying cellular heterogeneity in the tumor microenvironment. Recently, large-scale pre-trained language models (PLMs) have achieved significant progress in cell-type annotation of scRNA-seq data. This approach effectively addresses previous methods' shortcomings in performance and generalization. However, fine-tuning PLMs for different downstream tasks demands considerable computational resources, rendering it impractical. Hence, a new research branch introduces parameter-efficient fine-tuning (PEFT). This involves optimizing a few parameters while leaving the majority unchanged, leading to substantial reductions in computational expenses. Here, we utilize scBERT, a large-scale pre-trained model, to explore the capabilities of three PEFT methods in scRNA-seq cell type annotation. Extensive benchmark studies across several datasets demonstrate the superior applicability of PEFT methods. Furthermore, downstream analysis using models obtained through PEFT showcases their utility in novel cell type discovery and model interpretability for potential marker genes. Our findings underscore the considerable potential of PEFT in PLM-based cell type annotation, presenting novel perspectives for the analysis of scRNA-seq data.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Methods 生物-生化研究方法

CiteScore

9.80

自引率

2.10%

发文量

222

审稿时长

11.3 weeks

期刊介绍： Methods focuses on rapidly developing techniques in the experimental biological and medical sciences. Each topical issue, organized by a guest editor who is an expert in the area covered, consists solely of invited quality articles by specialist authors, many of them reviews. Issues are devoted to specific technical approaches with emphasis on clear detailed descriptions of protocols that allow them to be reproduced easily. The background information provided enables researchers to understand the principles underlying the methods; other helpful sections include comparisons of alternative methods giving the advantages and disadvantages of particular methods, guidance on avoiding potential pitfalls, and suggestions for troubleshooting.