生物学驱动的洞察单细胞基础模型的力量

IF 10.1 1区生物学 Q1 BIOTECHNOLOGY & APPLIED MICROBIOLOGY

Genome Biology Pub Date : 2025-10-03 DOI:10.1186/s13059-025-03781-6

Jialu Wu, Qing Ye, Yilin Wang, Renling Hu, Yiheng Zhu, Mingze Yin, Tianyue Wang, Jike Wang, Chang-Yu Hsieh, Tingjun Hou

{"title":"生物学驱动的洞察单细胞基础模型的力量","authors":"Jialu Wu, Qing Ye, Yilin Wang, Renling Hu, Yiheng Zhu, Mingze Yin, Tianyue Wang, Jike Wang, Chang-Yu Hsieh, Tingjun Hou","doi":"10.1186/s13059-025-03781-6","DOIUrl":null,"url":null,"abstract":"Single-cell foundation models (scFMs) have emerged as powerful tools for integrating heterogeneous datasets and exploring biological systems. Despite high expectations, their ability to extract unique biological insights beyond standard methods and their advantages over traditional approaches in specific tasks remain unclear. Here, we present a comprehensive benchmark study of six scFMs against well-established baselines under realistic conditions, encompassing two gene-level and four cell-level tasks. Pre-clinical batch integration and cell type annotation are evaluated across five datasets with diverse biological conditions, while clinically relevant tasks, such as cancer cell identification and drug sensitivity prediction, are assessed across seven cancer types and four drugs. Model performance is evaluated using 12 metrics spanning unsupervised, supervised, and knowledge-based approaches, including scGraph-OntoRWR, a novel metric designed to uncover intrinsic knowledge encoded by scFMs. We provide holistic rankings from dataset-specific to general performance to guide model selection. Our findings reveal that scFMs are robust and versatile tools for diverse applications while simpler machine learning models are more adept at efficiently adapting to specific datasets, particularly under resource constraints. Notably, no single scFM consistently outperforms others across all tasks, emphasizing the need for tailored model selection based on factors such as dataset size, task complexity, biological interpretability, and computational resources. This benchmark introduces novel evaluation perspectives, identifying the strengths and limitations of current scFMs, and paves the way for their effective application in biological and clinical research, including cell atlas construction, tumor microenvironment studies, and treatment decision-making.","PeriodicalId":12611,"journal":{"name":"Genome Biology","volume":"99 1","pages":""},"PeriodicalIF":10.1000,"publicationDate":"2025-10-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Biology-driven insights into the power of single-cell foundation models\",\"authors\":\"Jialu Wu, Qing Ye, Yilin Wang, Renling Hu, Yiheng Zhu, Mingze Yin, Tianyue Wang, Jike Wang, Chang-Yu Hsieh, Tingjun Hou\",\"doi\":\"10.1186/s13059-025-03781-6\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Single-cell foundation models (scFMs) have emerged as powerful tools for integrating heterogeneous datasets and exploring biological systems. Despite high expectations, their ability to extract unique biological insights beyond standard methods and their advantages over traditional approaches in specific tasks remain unclear. Here, we present a comprehensive benchmark study of six scFMs against well-established baselines under realistic conditions, encompassing two gene-level and four cell-level tasks. Pre-clinical batch integration and cell type annotation are evaluated across five datasets with diverse biological conditions, while clinically relevant tasks, such as cancer cell identification and drug sensitivity prediction, are assessed across seven cancer types and four drugs. Model performance is evaluated using 12 metrics spanning unsupervised, supervised, and knowledge-based approaches, including scGraph-OntoRWR, a novel metric designed to uncover intrinsic knowledge encoded by scFMs. We provide holistic rankings from dataset-specific to general performance to guide model selection. Our findings reveal that scFMs are robust and versatile tools for diverse applications while simpler machine learning models are more adept at efficiently adapting to specific datasets, particularly under resource constraints. Notably, no single scFM consistently outperforms others across all tasks, emphasizing the need for tailored model selection based on factors such as dataset size, task complexity, biological interpretability, and computational resources. This benchmark introduces novel evaluation perspectives, identifying the strengths and limitations of current scFMs, and paves the way for their effective application in biological and clinical research, including cell atlas construction, tumor microenvironment studies, and treatment decision-making.\",\"PeriodicalId\":12611,\"journal\":{\"name\":\"Genome Biology\",\"volume\":\"99 1\",\"pages\":\"\"},\"PeriodicalIF\":10.1000,\"publicationDate\":\"2025-10-03\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Genome Biology\",\"FirstCategoryId\":\"99\",\"ListUrlMain\":\"https://doi.org/10.1186/s13059-025-03781-6\",\"RegionNum\":1,\"RegionCategory\":\"生物学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"BIOTECHNOLOGY & APPLIED MICROBIOLOGY\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Genome Biology","FirstCategoryId":"99","ListUrlMain":"https://doi.org/10.1186/s13059-025-03781-6","RegionNum":1,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"BIOTECHNOLOGY & APPLIED MICROBIOLOGY","Score":null,"Total":0}

引用次数: 0

摘要

单细胞基础模型（scFMs）已成为整合异构数据集和探索生物系统的强大工具。尽管有很高的期望，但它们在标准方法之外提取独特生物学见解的能力，以及它们在特定任务中优于传统方法的优势仍不清楚。在这里，我们在现实条件下对六种scFMs进行了全面的基准研究，包括两个基因水平和四个细胞水平的任务。临床前批量整合和细胞类型注释在5个具有不同生物学条件的数据集上进行评估，而临床相关任务，如癌细胞鉴定和药物敏感性预测，在7种癌症类型和4种药物上进行评估。模型性能使用12个指标进行评估，这些指标跨越无监督、有监督和基于知识的方法，包括scGraph-OntoRWR，这是一种旨在揭示scFMs编码的内在知识的新指标。我们提供从数据集特定到一般性能的整体排名，以指导模型选择。我们的研究结果表明，scfm是各种应用程序的健壮和通用工具，而更简单的机器学习模型更擅长有效地适应特定的数据集，特别是在资源限制下。值得注意的是，没有一个scFM在所有任务中始终优于其他scFM，这强调了基于数据集大小、任务复杂性、生物可解释性和计算资源等因素定制模型选择的必要性。该基准引入了新的评估视角，确定了当前scFMs的优势和局限性，并为其在生物学和临床研究中的有效应用铺平了道路，包括细胞图谱构建、肿瘤微环境研究和治疗决策。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Biology-driven insights into the power of single-cell foundation models

Single-cell foundation models (scFMs) have emerged as powerful tools for integrating heterogeneous datasets and exploring biological systems. Despite high expectations, their ability to extract unique biological insights beyond standard methods and their advantages over traditional approaches in specific tasks remain unclear. Here, we present a comprehensive benchmark study of six scFMs against well-established baselines under realistic conditions, encompassing two gene-level and four cell-level tasks. Pre-clinical batch integration and cell type annotation are evaluated across five datasets with diverse biological conditions, while clinically relevant tasks, such as cancer cell identification and drug sensitivity prediction, are assessed across seven cancer types and four drugs. Model performance is evaluated using 12 metrics spanning unsupervised, supervised, and knowledge-based approaches, including scGraph-OntoRWR, a novel metric designed to uncover intrinsic knowledge encoded by scFMs. We provide holistic rankings from dataset-specific to general performance to guide model selection. Our findings reveal that scFMs are robust and versatile tools for diverse applications while simpler machine learning models are more adept at efficiently adapting to specific datasets, particularly under resource constraints. Notably, no single scFM consistently outperforms others across all tasks, emphasizing the need for tailored model selection based on factors such as dataset size, task complexity, biological interpretability, and computational resources. This benchmark introduces novel evaluation perspectives, identifying the strengths and limitations of current scFMs, and paves the way for their effective application in biological and clinical research, including cell atlas construction, tumor microenvironment studies, and treatment decision-making.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Genome Biology Biochemistry, Genetics and Molecular Biology-Genetics

CiteScore

21.00

自引率

3.30%

发文量

241

审稿时长

2 months

期刊介绍： Genome Biology stands as a premier platform for exceptional research across all domains of biology and biomedicine, explored through a genomic and post-genomic lens. With an impressive impact factor of 12.3 (2022),* the journal secures its position as the 3rd-ranked research journal in the Genetics and Heredity category and the 2nd-ranked research journal in the Biotechnology and Applied Microbiology category by Thomson Reuters. Notably, Genome Biology holds the distinction of being the highest-ranked open-access journal in this category. Our dedicated team of highly trained in-house Editors collaborates closely with our esteemed Editorial Board of international experts, ensuring the journal remains on the forefront of scientific advances and community standards. Regular engagement with researchers at conferences and institute visits underscores our commitment to staying abreast of the latest developments in the field.