BIMSSA：用salp群优化和集成机器学习方法增强癌症预测。

IF 2.8 3区生物学 Q2 GENETICS & HEREDITY

Frontiers in Genetics Pub Date : 2025-01-06 eCollection Date: 2024-01-01 DOI:10.3389/fgene.2024.1491602

Pinakshi Panda, Sukant Kishoro Bisoy, Amrutanshu Panigrahi, Abhilash Pati, Bibhuprasad Sahu, Zheshan Guo, Haipeng Liu, Prince Jain

{"title":"BIMSSA：用salp群优化和集成机器学习方法增强癌症预测。","authors":"Pinakshi Panda, Sukant Kishoro Bisoy, Amrutanshu Panigrahi, Abhilash Pati, Bibhuprasad Sahu, Zheshan Guo, Haipeng Liu, Prince Jain","doi":"10.3389/fgene.2024.1491602","DOIUrl":null,"url":null,"abstract":"Background: Cancer rates are rising rapidly, causing global mortality. According to the World Health Organization (WHO), 9.9 million people died from cancer in 2020. Machine learning (ML) helps identify cancer early, reducing deaths. An ML-based cancer diagnostic model can use the patient's genetic information, such as microarray data. Microarray data are high dimensional, which can degrade the performance of the ML-based models. For this, feature selection becomes essential.Methods: Swarm Optimization Algorithm (SSA), Improved Maximum Relevance and Minimum Redundancy (IMRMR), and Boruta form the basis of this work's ML-based model BIMSSA. The BIMSSA model implements a pipelined feature selection method to effectively handle high-dimensional microarray data. Initially, Boruta and IMRMR were applied to extract relevant gene expression aspects. Then, SSA was implemented to optimize feature size. To optimize feature space, five separate machine learning classifiers, Support Vector Machine (SVM), Random Forest (RF), Extreme Learning Machine (ELM), AdaBoost, and XGBoost, were applied as the base learners. Then, majority voting was used to build an ensemble of the top three algorithms. The ensemble ML-based model BIMSSA was evaluated using microarray data from four different cancer types: Adult acute lymphoblastic leukemia and Acute myelogenous leukemia (ALL-AML), Lymphoma, Mixed-lineage leukemia (MLL), and Small round blue cell tumors (SRBCT).Results: In terms of accuracy, the proposed BIMSSA (Boruta + IMRMR + SSA) achieved 96.7% for ALL-AML, 96.2% for Lymphoma, 95.1% for MLL, and 97.1% for the SRBCT cancer datasets, according to the empirical evaluations.Conclusion: The results show that the proposed approach can accurately predict different forms of cancer, which is useful for both physicians and researchers.","PeriodicalId":12750,"journal":{"name":"Frontiers in Genetics","volume":"15 ","pages":"1491602"},"PeriodicalIF":2.8000,"publicationDate":"2025-01-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11743448/pdf/","citationCount":"0","resultStr":"{\"title\":\"BIMSSA: enhancing cancer prediction with salp swarm optimization and ensemble machine learning approaches.\",\"authors\":\"Pinakshi Panda, Sukant Kishoro Bisoy, Amrutanshu Panigrahi, Abhilash Pati, Bibhuprasad Sahu, Zheshan Guo, Haipeng Liu, Prince Jain\",\"doi\":\"10.3389/fgene.2024.1491602\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Background: Cancer rates are rising rapidly, causing global mortality. According to the World Health Organization (WHO), 9.9 million people died from cancer in 2020. Machine learning (ML) helps identify cancer early, reducing deaths. An ML-based cancer diagnostic model can use the patient's genetic information, such as microarray data. Microarray data are high dimensional, which can degrade the performance of the ML-based models. For this, feature selection becomes essential.Methods: Swarm Optimization Algorithm (SSA), Improved Maximum Relevance and Minimum Redundancy (IMRMR), and Boruta form the basis of this work's ML-based model BIMSSA. The BIMSSA model implements a pipelined feature selection method to effectively handle high-dimensional microarray data. Initially, Boruta and IMRMR were applied to extract relevant gene expression aspects. Then, SSA was implemented to optimize feature size. To optimize feature space, five separate machine learning classifiers, Support Vector Machine (SVM), Random Forest (RF), Extreme Learning Machine (ELM), AdaBoost, and XGBoost, were applied as the base learners. Then, majority voting was used to build an ensemble of the top three algorithms. The ensemble ML-based model BIMSSA was evaluated using microarray data from four different cancer types: Adult acute lymphoblastic leukemia and Acute myelogenous leukemia (ALL-AML), Lymphoma, Mixed-lineage leukemia (MLL), and Small round blue cell tumors (SRBCT).Results: In terms of accuracy, the proposed BIMSSA (Boruta + IMRMR + SSA) achieved 96.7% for ALL-AML, 96.2% for Lymphoma, 95.1% for MLL, and 97.1% for the SRBCT cancer datasets, according to the empirical evaluations.Conclusion: The results show that the proposed approach can accurately predict different forms of cancer, which is useful for both physicians and researchers.\",\"PeriodicalId\":12750,\"journal\":{\"name\":\"Frontiers in Genetics\",\"volume\":\"15 \",\"pages\":\"1491602\"},\"PeriodicalIF\":2.8000,\"publicationDate\":\"2025-01-06\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11743448/pdf/\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Frontiers in Genetics\",\"FirstCategoryId\":\"99\",\"ListUrlMain\":\"https://doi.org/10.3389/fgene.2024.1491602\",\"RegionNum\":3,\"RegionCategory\":\"生物学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"2024/1/1 0:00:00\",\"PubModel\":\"eCollection\",\"JCR\":\"Q2\",\"JCRName\":\"GENETICS & HEREDITY\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Frontiers in Genetics","FirstCategoryId":"99","ListUrlMain":"https://doi.org/10.3389/fgene.2024.1491602","RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2024/1/1 0:00:00","PubModel":"eCollection","JCR":"Q2","JCRName":"GENETICS & HEREDITY","Score":null,"Total":0}

引用次数: 0

摘要

背景：癌症发病率正在迅速上升，导致全球死亡。根据世界卫生组织（WHO）的数据，2020年有990万人死于癌症。机器学习（ML）有助于早期识别癌症，减少死亡。基于ml的癌症诊断模型可以使用患者的遗传信息，如微阵列数据。微阵列数据是高维的，这会降低基于机器学习的模型的性能。为此，特征选择变得至关重要。方法：群优化算法（SSA）、改进的最大相关性和最小冗余（IMRMR）和Boruta构成了本文基于ml的模型BIMSSA的基础。BIMSSA模型采用流水线特征选择方法，有效处理高维微阵列数据。最初采用Boruta和IMRMR提取相关基因表达方面。然后，实现SSA优化特征大小。为了优化特征空间，采用支持向量机（SVM）、随机森林（RF）、极限学习机（ELM）、AdaBoost和XGBoost五个独立的机器学习分类器作为基础学习器。然后，使用多数投票来构建前三种算法的集合。采用四种不同癌症类型的微阵列数据对基于ml的集成模型BIMSSA进行评估：成人急性淋巴细胞白血病和急性髓性白血病（ALL-AML）、淋巴瘤、混合谱系白血病（MLL）和小圆蓝细胞瘤（SRBCT）。结果：根据经验评估，在准确性方面，提出的BIMSSA （Boruta + IMRMR + SSA）在ALL-AML中达到96.7%，在淋巴瘤中达到96.2%，在MLL中达到95.1%，在SRBCT癌症数据集中达到97.1%。结论：该方法可以准确预测不同类型的癌症，对医生和研究人员都有帮助。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

BIMSSA: enhancing cancer prediction with salp swarm optimization and ensemble machine learning approaches.

Background: Cancer rates are rising rapidly, causing global mortality. According to the World Health Organization (WHO), 9.9 million people died from cancer in 2020. Machine learning (ML) helps identify cancer early, reducing deaths. An ML-based cancer diagnostic model can use the patient's genetic information, such as microarray data. Microarray data are high dimensional, which can degrade the performance of the ML-based models. For this, feature selection becomes essential.

Methods: Swarm Optimization Algorithm (SSA), Improved Maximum Relevance and Minimum Redundancy (IMRMR), and Boruta form the basis of this work's ML-based model BIMSSA. The BIMSSA model implements a pipelined feature selection method to effectively handle high-dimensional microarray data. Initially, Boruta and IMRMR were applied to extract relevant gene expression aspects. Then, SSA was implemented to optimize feature size. To optimize feature space, five separate machine learning classifiers, Support Vector Machine (SVM), Random Forest (RF), Extreme Learning Machine (ELM), AdaBoost, and XGBoost, were applied as the base learners. Then, majority voting was used to build an ensemble of the top three algorithms. The ensemble ML-based model BIMSSA was evaluated using microarray data from four different cancer types: Adult acute lymphoblastic leukemia and Acute myelogenous leukemia (ALL-AML), Lymphoma, Mixed-lineage leukemia (MLL), and Small round blue cell tumors (SRBCT).

Results: In terms of accuracy, the proposed BIMSSA (Boruta + IMRMR + SSA) achieved 96.7% for ALL-AML, 96.2% for Lymphoma, 95.1% for MLL, and 97.1% for the SRBCT cancer datasets, according to the empirical evaluations.

Conclusion: The results show that the proposed approach can accurately predict different forms of cancer, which is useful for both physicians and researchers.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Frontiers in Genetics Biochemistry, Genetics and Molecular Biology-Molecular Medicine

CiteScore

5.50

自引率

8.10%

发文量

3491

审稿时长

14 weeks

期刊介绍： Frontiers in Genetics publishes rigorously peer-reviewed research on genes and genomes relating to all the domains of life, from humans to plants to livestock and other model organisms. Led by an outstanding Editorial Board of the world’s leading experts, this multidisciplinary, open-access journal is at the forefront of communicating cutting-edge research to researchers, academics, clinicians, policy makers and the public. The study of inheritance and the impact of the genome on various biological processes is well documented. However, the majority of discoveries are still to come. A new era is seeing major developments in the function and variability of the genome, the use of genetic and genomic tools and the analysis of the genetic basis of various biological phenomena.