基于自适应邻域保持多目标粒子群优化的基因选择。

IF 3.5 4区计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

PeerJ Computer Science Pub Date : 2025-05-28 eCollection Date: 2025-01-01 DOI:10.7717/peerj-cs.2872

Sumet Mehta, Fei Han, Muhammad Sohail, Bhekisipho Twala, Asad Ullah, Fasee Ullah, Arfat Ahmad Khan, Qinghua Ling

{"title":"基于自适应邻域保持多目标粒子群优化的基因选择。","authors":"Sumet Mehta, Fei Han, Muhammad Sohail, Bhekisipho Twala, Asad Ullah, Fasee Ullah, Arfat Ahmad Khan, Qinghua Ling","doi":"10.7717/peerj-cs.2872","DOIUrl":null,"url":null,"abstract":"The analysis of high-dimensional microarray gene expression data presents critical challenges, including excessive dimensionality, increased computational burden, and sensitivity to random initialization. Traditional optimization algorithms often produce inconsistent and suboptimal results, while failing to preserve local data structures limiting both predictive accuracy and biological interpretability. To address these limitations, this study proposes an adaptive neighborhood-preserving multi-objective particle swarm optimization (ANPMOPSO) framework for gene selection. ANPMOPSO introduces four key innovations: (1) a weighted neighborhood-preserving ensemble embedding (WNPEE) technique for dimensionality reduction that retains local structure; (2) Sobol sequence (SS) initialization to enhance population diversity and convergence stability; (3) a differential evolution (DE)-based adaptive velocity update to dynamically balance exploration and exploitation; and (4) a novel ranking strategy that combines Pareto dominance with neighborhood preservation quality to prioritize biologically meaningful gene subsets. Experimental evaluations on six benchmark microarray datasets and eleven multi-modal test functions (MMFs) demonstrate that ANPMOPSO consistently outperforms state-of-the-art methods. For example, it achieves 100% classification accuracy on Leukemia and Small-Round-Blue-Cell Tumor (SRBCT) using only 3-5 genes, improving accuracy by 5-15% over competitors while reducing gene subsets by 40-60%. Additionally, on MMFs, ANPMOPSO attains superior hypervolume values (e.g., 1.0617 ± 0.2225 on MMF1, approximately 10-20% higher than competitors), confirming its robustness in balancing convergence and diversity. Although the method incurs higher training time due to its structural and adaptive components, it achieves a strong trade-off between computational cost and biological relevance, making it a promising tool for high-dimensional gene selection in bioinformatics.","PeriodicalId":54224,"journal":{"name":"PeerJ Computer Science","volume":"11 ","pages":"e2872"},"PeriodicalIF":3.5000,"publicationDate":"2025-05-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12192825/pdf/","citationCount":"0","resultStr":"{\"title\":\"Gene selection based on adaptive neighborhood-preserving multi-objective particle swarm optimization.\",\"authors\":\"Sumet Mehta, Fei Han, Muhammad Sohail, Bhekisipho Twala, Asad Ullah, Fasee Ullah, Arfat Ahmad Khan, Qinghua Ling\",\"doi\":\"10.7717/peerj-cs.2872\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The analysis of high-dimensional microarray gene expression data presents critical challenges, including excessive dimensionality, increased computational burden, and sensitivity to random initialization. Traditional optimization algorithms often produce inconsistent and suboptimal results, while failing to preserve local data structures limiting both predictive accuracy and biological interpretability. To address these limitations, this study proposes an adaptive neighborhood-preserving multi-objective particle swarm optimization (ANPMOPSO) framework for gene selection. ANPMOPSO introduces four key innovations: (1) a weighted neighborhood-preserving ensemble embedding (WNPEE) technique for dimensionality reduction that retains local structure; (2) Sobol sequence (SS) initialization to enhance population diversity and convergence stability; (3) a differential evolution (DE)-based adaptive velocity update to dynamically balance exploration and exploitation; and (4) a novel ranking strategy that combines Pareto dominance with neighborhood preservation quality to prioritize biologically meaningful gene subsets. Experimental evaluations on six benchmark microarray datasets and eleven multi-modal test functions (MMFs) demonstrate that ANPMOPSO consistently outperforms state-of-the-art methods. For example, it achieves 100% classification accuracy on Leukemia and Small-Round-Blue-Cell Tumor (SRBCT) using only 3-5 genes, improving accuracy by 5-15% over competitors while reducing gene subsets by 40-60%. Additionally, on MMFs, ANPMOPSO attains superior hypervolume values (e.g., 1.0617 ± 0.2225 on MMF1, approximately 10-20% higher than competitors), confirming its robustness in balancing convergence and diversity. Although the method incurs higher training time due to its structural and adaptive components, it achieves a strong trade-off between computational cost and biological relevance, making it a promising tool for high-dimensional gene selection in bioinformatics.\",\"PeriodicalId\":54224,\"journal\":{\"name\":\"PeerJ Computer Science\",\"volume\":\"11 \",\"pages\":\"e2872\"},\"PeriodicalIF\":3.5000,\"publicationDate\":\"2025-05-28\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12192825/pdf/\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"PeerJ Computer Science\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://doi.org/10.7717/peerj-cs.2872\",\"RegionNum\":4,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"2025/1/1 0:00:00\",\"PubModel\":\"eCollection\",\"JCR\":\"Q2\",\"JCRName\":\"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"PeerJ Computer Science","FirstCategoryId":"94","ListUrlMain":"https://doi.org/10.7717/peerj-cs.2872","RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2025/1/1 0:00:00","PubModel":"eCollection","JCR":"Q2","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}

引用次数: 0

摘要

高维微阵列基因表达数据的分析面临着严峻的挑战，包括过多的维数、增加的计算负担和对随机初始化的敏感性。传统的优化算法经常产生不一致和次优的结果，同时不能保留局部数据结构，限制了预测的准确性和生物可解释性。为了解决这些局限性，本研究提出了一种自适应邻域保持多目标粒子群优化（ANPMOPSO）基因选择框架。ANPMOPSO引入了四个关键创新：(1)保留局部结构的加权邻域保持集成嵌入（WNPEE）降维技术；(2) Sobol序列初始化，增强种群多样性和收敛稳定性；(3)基于差分进化（DE）的自适应速度更新，实现勘探和开采的动态平衡；(4)结合帕累托优势和邻域保存质量的排序策略，对具有生物学意义的基因子集进行排序。对六个基准微阵列数据集和11个多模态测试函数（mmf）的实验评估表明，ANPMOPSO始终优于最先进的方法。例如，仅使用3-5个基因，它在白血病和小圆蓝细胞肿瘤（SRBCT）上实现了100%的分类准确率，比竞争对手提高了5-15%，同时减少了40-60%的基因子集。此外，在mmf上，ANPMOPSO获得了优越的超容积值（例如，MMF1上的1.0617±0.2225，比竞争对手高出约10-20%），证实了其在平衡收敛性和多样性方面的稳健性。尽管该方法由于其结构和自适应成分而需要较高的训练时间，但它在计算成本和生物相关性之间实现了良好的权衡，使其成为生物信息学中高维基因选择的有前途的工具。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Gene selection based on adaptive neighborhood-preserving multi-objective particle swarm optimization.

The analysis of high-dimensional microarray gene expression data presents critical challenges, including excessive dimensionality, increased computational burden, and sensitivity to random initialization. Traditional optimization algorithms often produce inconsistent and suboptimal results, while failing to preserve local data structures limiting both predictive accuracy and biological interpretability. To address these limitations, this study proposes an adaptive neighborhood-preserving multi-objective particle swarm optimization (ANPMOPSO) framework for gene selection. ANPMOPSO introduces four key innovations: (1) a weighted neighborhood-preserving ensemble embedding (WNPEE) technique for dimensionality reduction that retains local structure; (2) Sobol sequence (SS) initialization to enhance population diversity and convergence stability; (3) a differential evolution (DE)-based adaptive velocity update to dynamically balance exploration and exploitation; and (4) a novel ranking strategy that combines Pareto dominance with neighborhood preservation quality to prioritize biologically meaningful gene subsets. Experimental evaluations on six benchmark microarray datasets and eleven multi-modal test functions (MMFs) demonstrate that ANPMOPSO consistently outperforms state-of-the-art methods. For example, it achieves 100% classification accuracy on Leukemia and Small-Round-Blue-Cell Tumor (SRBCT) using only 3-5 genes, improving accuracy by 5-15% over competitors while reducing gene subsets by 40-60%. Additionally, on MMFs, ANPMOPSO attains superior hypervolume values (e.g., 1.0617 ± 0.2225 on MMF1, approximately 10-20% higher than competitors), confirming its robustness in balancing convergence and diversity. Although the method incurs higher training time due to its structural and adaptive components, it achieves a strong trade-off between computational cost and biological relevance, making it a promising tool for high-dimensional gene selection in bioinformatics.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

PeerJ Computer Science Computer Science-General Computer Science

CiteScore

6.10

自引率

5.30%

发文量

332

审稿时长

10 weeks

期刊介绍： PeerJ Computer Science is the new open access journal covering all subject areas in computer science, with the backing of a prestigious advisory board and more than 300 academic editors.