Surrogate-Assisted Multiobjective Gene Selection for Cell Classification From Large-Scale Single-Cell RNA Sequencing Data

IF 11.7 1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE
Jianqing Lin;Cheng He;Hanjing Jiang;Yabing Huang;Yaochu Jin
{"title":"Surrogate-Assisted Multiobjective Gene Selection for Cell Classification From Large-Scale Single-Cell RNA Sequencing Data","authors":"Jianqing Lin;Cheng He;Hanjing Jiang;Yabing Huang;Yaochu Jin","doi":"10.1109/TEVC.2025.3533490","DOIUrl":null,"url":null,"abstract":"Accurate cell classification is crucial but expensive for large-scale single-cell RNA sequencing (scRNA-seq) analysis. Gene selection (GS) emerges as a pivotal technique in identifying gene subsets of scRNA-seq for classification accuracy improvement and gene scale reduction. Nevertheless, the rising scale of scRNA-seq data presents challenges to existing GS methods regarding performance and computational time. Thus, we propose a surrogate-assisted evolutionary algorithm for multiobjective GS to address these deficiencies. An innovative two-phase initialization method is proposed to select sparse solutions to provide preliminary insights into gene contributions. Then, a binary competitive swarm optimizer is proposed for effective global search, where a local search method is embedded to eliminate irrelevant genes for efficiency consideration. Additionally, a surrogate model is adopted to forecast classification accuracy efficiently and substitutes part of the computationally expensive classification process. Experiments are conducted on eight large-scale scRNA-seq datasets with more than 20 000 genes. The effectiveness of the proposed GS method for scRNA-seq cell classification compared with eight state-of-the-art methods is validated. Gene expression analysis results of selected genes further validated the significance of the genes selected by the proposed method in the classification of scRNA-seq data.","PeriodicalId":13206,"journal":{"name":"IEEE Transactions on Evolutionary Computation","volume":"29 3","pages":"601-615"},"PeriodicalIF":11.7000,"publicationDate":"2025-01-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Evolutionary Computation","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/10852178/","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 0

Abstract

Accurate cell classification is crucial but expensive for large-scale single-cell RNA sequencing (scRNA-seq) analysis. Gene selection (GS) emerges as a pivotal technique in identifying gene subsets of scRNA-seq for classification accuracy improvement and gene scale reduction. Nevertheless, the rising scale of scRNA-seq data presents challenges to existing GS methods regarding performance and computational time. Thus, we propose a surrogate-assisted evolutionary algorithm for multiobjective GS to address these deficiencies. An innovative two-phase initialization method is proposed to select sparse solutions to provide preliminary insights into gene contributions. Then, a binary competitive swarm optimizer is proposed for effective global search, where a local search method is embedded to eliminate irrelevant genes for efficiency consideration. Additionally, a surrogate model is adopted to forecast classification accuracy efficiently and substitutes part of the computationally expensive classification process. Experiments are conducted on eight large-scale scRNA-seq datasets with more than 20 000 genes. The effectiveness of the proposed GS method for scRNA-seq cell classification compared with eight state-of-the-art methods is validated. Gene expression analysis results of selected genes further validated the significance of the genes selected by the proposed method in the classification of scRNA-seq data.
从大规模单细胞RNA测序数据中辅助多目标基因选择进行细胞分类
准确的细胞分类对于大规模单细胞RNA测序(scRNA-seq)分析至关重要,但成本高昂。基因选择(GS)作为一种识别scRNA-seq基因亚群的关键技术,可以提高分类精度和减少基因规模。然而,scRNA-seq数据规模的增加对现有的GS方法在性能和计算时间方面提出了挑战。因此,我们提出了一种多目标GS的代理辅助进化算法来解决这些缺陷。提出了一种创新的两阶段初始化方法来选择稀疏解,从而初步了解基因的贡献。然后,提出了一种有效全局搜索的二元竞争群优化器,其中嵌入了一种局部搜索方法来消除不相关基因以提高效率。此外,采用代理模型有效地预测分类精度,部分替代了计算量大的分类过程。实验在8个包含超过2万个基因的大规模scRNA-seq数据集上进行。与八种最先进的方法进行比较,验证了所提出的GS方法对scRNA-seq细胞分类的有效性。所选基因的基因表达分析结果进一步验证了本文方法所选基因在scRNA-seq数据分类中的意义。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
IEEE Transactions on Evolutionary Computation
IEEE Transactions on Evolutionary Computation 工程技术-计算机:理论方法
CiteScore
21.90
自引率
9.80%
发文量
196
审稿时长
3.6 months
期刊介绍: The IEEE Transactions on Evolutionary Computation is published by the IEEE Computational Intelligence Society on behalf of 13 societies: Circuits and Systems; Computer; Control Systems; Engineering in Medicine and Biology; Industrial Electronics; Industry Applications; Lasers and Electro-Optics; Oceanic Engineering; Power Engineering; Robotics and Automation; Signal Processing; Social Implications of Technology; and Systems, Man, and Cybernetics. The journal publishes original papers in evolutionary computation and related areas such as nature-inspired algorithms, population-based methods, optimization, and hybrid systems. It welcomes both purely theoretical papers and application papers that provide general insights into these areas of computation.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信