Stable Feature Selection using Improved Whale Optimization Algorithm for Microarray Datasets

Dipti Theng, K. Bhoyar
{"title":"Stable Feature Selection using Improved Whale Optimization Algorithm for Microarray Datasets","authors":"Dipti Theng, K. Bhoyar","doi":"10.14201/adcaij.31187","DOIUrl":null,"url":null,"abstract":"A microarray is a collection of DNA sequences that reflect an organism’s whole gene set and are organized in a grid pattern for use in genetic testing. Microarray datasets are extremely high-dimensional and have a very small sample size, posing the challenges of insufficient data and high computational complexity. Identification of true biomarkers that are the most significant features (a very small subset of the complete feature set) is desired to solve these issues. This reduces over-fitting, and time complexity, and improves model generalization. Various feature selection algorithms are used for this biomarker identification. This research proposed a modification to the whale optimization algorithm (WOAm) for biomarker discovery, in which the fitness of each search agent is evaluated using the hinge loss function during the hunting for prey phase to determine the optimal search agent. Also compared the results of the proposed modified algorithm with the original whale optimization algorithm and also with contemporary algorithms like the marine predator algorithm and grey wolf optimization. All these algorithms are evaluated on six different high-dimensional microarray datasets. It has been observed that the proposed modification for the whale optimization algorithm has significantly improved the results of feature selection across all the datasets. Domain experts trust the resultant biomarker/ associated genes by the stability of the results obtained. The chosen feature set’s stability was also evaluated during the research work. According to the findings, our proposed WOAm has superior stability compared to other algorithms for the CNS, colon, Leukemia, and OSCC. datasets.","PeriodicalId":504145,"journal":{"name":"ADCAIJ: Advances in Distributed Computing and Artificial Intelligence Journal","volume":"4 4","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2023-12-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"ADCAIJ: Advances in Distributed Computing and Artificial Intelligence Journal","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.14201/adcaij.31187","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

A microarray is a collection of DNA sequences that reflect an organism’s whole gene set and are organized in a grid pattern for use in genetic testing. Microarray datasets are extremely high-dimensional and have a very small sample size, posing the challenges of insufficient data and high computational complexity. Identification of true biomarkers that are the most significant features (a very small subset of the complete feature set) is desired to solve these issues. This reduces over-fitting, and time complexity, and improves model generalization. Various feature selection algorithms are used for this biomarker identification. This research proposed a modification to the whale optimization algorithm (WOAm) for biomarker discovery, in which the fitness of each search agent is evaluated using the hinge loss function during the hunting for prey phase to determine the optimal search agent. Also compared the results of the proposed modified algorithm with the original whale optimization algorithm and also with contemporary algorithms like the marine predator algorithm and grey wolf optimization. All these algorithms are evaluated on six different high-dimensional microarray datasets. It has been observed that the proposed modification for the whale optimization algorithm has significantly improved the results of feature selection across all the datasets. Domain experts trust the resultant biomarker/ associated genes by the stability of the results obtained. The chosen feature set’s stability was also evaluated during the research work. According to the findings, our proposed WOAm has superior stability compared to other algorithms for the CNS, colon, Leukemia, and OSCC. datasets.
利用改进的鲸鱼优化算法为微阵列数据集选择稳定的特征
微阵列是反映生物体全基因组的 DNA 序列集合,以网格模式排列,用于基因检测。微阵列数据集具有极高的维度和极小的样本量,带来了数据不足和计算复杂性高的挑战。要解决这些问题,就需要识别真正的生物标志物,即最重要的特征(完整特征集的极小子集)。这样可以减少过度拟合和时间复杂性,并提高模型的泛化能力。各种特征选择算法都被用于这种生物标记物的识别。本研究提出了一种用于生物标记发现的鲸鱼优化算法(WOAm)的修改方案,即在狩猎猎物阶段使用铰链损失函数评估每个搜索代理的适配性,以确定最佳搜索代理。此外,还比较了所提出的改进算法与原始鲸鱼优化算法以及海洋捕食者算法和灰狼优化等当代算法的结果。所有这些算法都在六个不同的高维微阵列数据集上进行了评估。结果表明,针对鲸鱼优化算法提出的修改方案显著改善了所有数据集的特征选择结果。领域专家对生物标记物/相关基因的信任来源于所获得结果的稳定性。在研究工作中还对所选特征集的稳定性进行了评估。研究结果表明,在中枢神经系统、结肠、白血病和 OSCC 数据集上,与其他算法相比,我们提出的 WOAm 具有更高的稳定性。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信