基于集成机器学习的scRNA-seq数据预训练标注方法。

IF 3.3 3区 生物学 Q2 BIOCHEMICAL RESEARCH METHODS
Osama Elnahas, Waleed M Ead, Yushan Qiu, Jian Lu
{"title":"基于集成机器学习的scRNA-seq数据预训练标注方法。","authors":"Osama Elnahas, Waleed M Ead, Yushan Qiu, Jian Lu","doi":"10.1186/s12859-025-06151-y","DOIUrl":null,"url":null,"abstract":"<p><p>Single-cell RNA sequencing (scRNA-seq) has revolutionized the study of gene expression by allowing researchers to analyze the transcriptomes of individual cells. This technology provides unprecedented insights into cellular heterogeneity, cellular states, and biological processes at a single-cell resolution. The problem of single-cell RNA annotation involves assigning meaningful labels or annotations to each cell in the scRNA-seq dataset, indicating its corresponding cell type, state, or biological function. Current annotation methods are often challenged by limited source data quality, sensitivity to batch effects, and poor adaptability to uncharacterized cell types. We propose an ensemble machine learning-based pre-trained annotation framework that integrates gradient boosting and genetic optimization for robust feature selection. The proposed method uses ensemble learning to enhance annotation accuracy under data scarcity, addressing limitations in existing supervised methods by leveraging a combination of multiple annotated datasets and feature alignment strategies. Through comprehensive benchmarking across varied biological contexts, we demonstrate that the proposed approach significantly improves annotation accuracy and generalization across different scRNA-seq platforms, especially under conditions of reduced reference data. Results confirm its versatility and resilience in accurately annotating cell types, even under reduced data conditions, establishing it as a powerful tool for cell-type classification in scRNA-seq data.</p>","PeriodicalId":8958,"journal":{"name":"BMC Bioinformatics","volume":"26 1","pages":"166"},"PeriodicalIF":3.3000,"publicationDate":"2025-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12220795/pdf/","citationCount":"0","resultStr":"{\"title\":\"Ensemble machine learning-based pre-trained annotation approach for scRNA-seq data using gradient boosting with genetic optimizer.\",\"authors\":\"Osama Elnahas, Waleed M Ead, Yushan Qiu, Jian Lu\",\"doi\":\"10.1186/s12859-025-06151-y\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p><p>Single-cell RNA sequencing (scRNA-seq) has revolutionized the study of gene expression by allowing researchers to analyze the transcriptomes of individual cells. This technology provides unprecedented insights into cellular heterogeneity, cellular states, and biological processes at a single-cell resolution. The problem of single-cell RNA annotation involves assigning meaningful labels or annotations to each cell in the scRNA-seq dataset, indicating its corresponding cell type, state, or biological function. Current annotation methods are often challenged by limited source data quality, sensitivity to batch effects, and poor adaptability to uncharacterized cell types. We propose an ensemble machine learning-based pre-trained annotation framework that integrates gradient boosting and genetic optimization for robust feature selection. The proposed method uses ensemble learning to enhance annotation accuracy under data scarcity, addressing limitations in existing supervised methods by leveraging a combination of multiple annotated datasets and feature alignment strategies. Through comprehensive benchmarking across varied biological contexts, we demonstrate that the proposed approach significantly improves annotation accuracy and generalization across different scRNA-seq platforms, especially under conditions of reduced reference data. Results confirm its versatility and resilience in accurately annotating cell types, even under reduced data conditions, establishing it as a powerful tool for cell-type classification in scRNA-seq data.</p>\",\"PeriodicalId\":8958,\"journal\":{\"name\":\"BMC Bioinformatics\",\"volume\":\"26 1\",\"pages\":\"166\"},\"PeriodicalIF\":3.3000,\"publicationDate\":\"2025-07-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12220795/pdf/\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"BMC Bioinformatics\",\"FirstCategoryId\":\"99\",\"ListUrlMain\":\"https://doi.org/10.1186/s12859-025-06151-y\",\"RegionNum\":3,\"RegionCategory\":\"生物学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"BIOCHEMICAL RESEARCH METHODS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"BMC Bioinformatics","FirstCategoryId":"99","ListUrlMain":"https://doi.org/10.1186/s12859-025-06151-y","RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"BIOCHEMICAL RESEARCH METHODS","Score":null,"Total":0}
引用次数: 0

摘要

单细胞RNA测序(scRNA-seq)通过允许研究人员分析单个细胞的转录组,彻底改变了基因表达的研究。这项技术为单细胞分辨率下的细胞异质性、细胞状态和生物过程提供了前所未有的见解。单细胞RNA注释的问题涉及到为scRNA-seq数据集中的每个细胞分配有意义的标签或注释,表明其相应的细胞类型、状态或生物功能。当前的标注方法经常受到源数据质量有限、对批处理效果敏感以及对未表征的细胞类型适应性差的挑战。我们提出了一个基于集成机器学习的预训练注释框架,该框架集成了梯度增强和遗传优化,用于鲁棒特征选择。该方法利用集成学习来提高数据稀缺性下的标注准确性,通过利用多个标注数据集和特征对齐策略的组合来解决现有监督方法的局限性。通过在不同生物学背景下的综合基准测试,我们证明了所提出的方法显着提高了不同scRNA-seq平台的注释准确性和泛化性,特别是在参考数据减少的情况下。结果证实了它在准确注释细胞类型方面的通用性和弹性,即使在减少的数据条件下,也将其建立为scRNA-seq数据中细胞类型分类的强大工具。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Ensemble machine learning-based pre-trained annotation approach for scRNA-seq data using gradient boosting with genetic optimizer.

Single-cell RNA sequencing (scRNA-seq) has revolutionized the study of gene expression by allowing researchers to analyze the transcriptomes of individual cells. This technology provides unprecedented insights into cellular heterogeneity, cellular states, and biological processes at a single-cell resolution. The problem of single-cell RNA annotation involves assigning meaningful labels or annotations to each cell in the scRNA-seq dataset, indicating its corresponding cell type, state, or biological function. Current annotation methods are often challenged by limited source data quality, sensitivity to batch effects, and poor adaptability to uncharacterized cell types. We propose an ensemble machine learning-based pre-trained annotation framework that integrates gradient boosting and genetic optimization for robust feature selection. The proposed method uses ensemble learning to enhance annotation accuracy under data scarcity, addressing limitations in existing supervised methods by leveraging a combination of multiple annotated datasets and feature alignment strategies. Through comprehensive benchmarking across varied biological contexts, we demonstrate that the proposed approach significantly improves annotation accuracy and generalization across different scRNA-seq platforms, especially under conditions of reduced reference data. Results confirm its versatility and resilience in accurately annotating cell types, even under reduced data conditions, establishing it as a powerful tool for cell-type classification in scRNA-seq data.

求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
BMC Bioinformatics
BMC Bioinformatics 生物-生化研究方法
CiteScore
5.70
自引率
3.30%
发文量
506
审稿时长
4.3 months
期刊介绍: BMC Bioinformatics is an open access, peer-reviewed journal that considers articles on all aspects of the development, testing and novel application of computational and statistical methods for the modeling and analysis of all kinds of biological data, as well as other areas of computational biology. BMC Bioinformatics is part of the BMC series which publishes subject-specific journals focused on the needs of individual research communities across all areas of biology and medicine. We offer an efficient, fair and friendly peer review service, and are committed to publishing all sound science, provided that there is some advance in knowledge presented by the work.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信