A Novel Leukemia Gene Features Extraction and Selection Technique for Robust Type Prediction Using Machine Learning

IF 2.9 4区综合性期刊 Q1 Multidisciplinary

Arabian Journal for Science and Engineering Pub Date : 2024-07-02 DOI:10.1007/s13369-024-09254-5

Mahwish Ilyas, Khalid Mahmood Aamir, Abdul Jaleel, Mohamed Deriche

{"title":"A Novel Leukemia Gene Features Extraction and Selection Technique for Robust Type Prediction Using Machine Learning","authors":"Mahwish Ilyas, Khalid Mahmood Aamir, Abdul Jaleel, Mohamed Deriche","doi":"10.1007/s13369-024-09254-5","DOIUrl":null,"url":null,"abstract":"<p>The broad term ‘leukemia’ refers to different types of cancer related to blood cells. Detecting and identifying the specific type of leukemia continues to be a major challenge in the medical field. Diverse machine learning techniques can be vital in analyzing gene expression data from microarray experiments in cancer research related to leukemia. In particular, the Leukemia Gene Expression data from the Curated Microarray Database (CuMiDa) is used here. Microarrays can be challenging in determining expression patterns. In this work, we use Fisher’s linear discriminant analysis, a popular technique for dimensionality reduction, together with a new feature selection approach to predict leukemia using microarray data. Our machine learning model is used to predict five types of leukemia including AML, PBSC CD34, Bone Marrow, and CD34 from the bone marrow. This is achieved by first rescaling the data features. We then use a feature selection technique to obtain the 25 most significant features from the dataset’s 22,283 features, then further reduce the dimension to 5 features only, to reduce computational complexity. These features are then fed into a Fisher’s linear discriminant module and a likelihood-based index for classification. The overall performance of our model was excellent. We examine the results using 2, 4, 5, 6, and 7 selected features. The best classification accuracies are 89.6%, 96.92%, and 96.15%, for 2, 5, and 7 selected features, respectively. Our results outperform the state-of-the-art by about 4%, with an excellent task completion time of less than 100 ms.</p>","PeriodicalId":8109,"journal":{"name":"Arabian Journal for Science and Engineering","volume":"16 1","pages":""},"PeriodicalIF":2.9000,"publicationDate":"2024-07-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Arabian Journal for Science and Engineering","FirstCategoryId":"103","ListUrlMain":"https://doi.org/10.1007/s13369-024-09254-5","RegionNum":4,"RegionCategory":"综合性期刊","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"Multidisciplinary","Score":null,"Total":0}

引用次数: 0

Abstract

The broad term ‘leukemia’ refers to different types of cancer related to blood cells. Detecting and identifying the specific type of leukemia continues to be a major challenge in the medical field. Diverse machine learning techniques can be vital in analyzing gene expression data from microarray experiments in cancer research related to leukemia. In particular, the Leukemia Gene Expression data from the Curated Microarray Database (CuMiDa) is used here. Microarrays can be challenging in determining expression patterns. In this work, we use Fisher’s linear discriminant analysis, a popular technique for dimensionality reduction, together with a new feature selection approach to predict leukemia using microarray data. Our machine learning model is used to predict five types of leukemia including AML, PBSC CD34, Bone Marrow, and CD34 from the bone marrow. This is achieved by first rescaling the data features. We then use a feature selection technique to obtain the 25 most significant features from the dataset’s 22,283 features, then further reduce the dimension to 5 features only, to reduce computational complexity. These features are then fed into a Fisher’s linear discriminant module and a likelihood-based index for classification. The overall performance of our model was excellent. We examine the results using 2, 4, 5, 6, and 7 selected features. The best classification accuracies are 89.6%, 96.92%, and 96.15%, for 2, 5, and 7 selected features, respectively. Our results outperform the state-of-the-art by about 4%, with an excellent task completion time of less than 100 ms.

Abstract Image

查看原文本刊更多论文

利用机器学习进行稳健类型预测的新型白血病基因特征提取和选择技术

广义的 "白血病 "是指与血细胞有关的各种癌症。检测和识别特定类型的白血病仍然是医学领域的一大挑战。在与白血病有关的癌症研究中，多种机器学习技术在分析微阵列实验的基因表达数据方面至关重要。本文特别使用了来自Curated Microarray Database（CuMiDa）的白血病基因表达数据。微阵列在确定表达模式方面具有挑战性。在这项工作中，我们使用了费雪线性判别分析（一种常用的降维技术）和一种新的特征选择方法，利用微阵列数据预测白血病。我们的机器学习模型用于预测五种类型的白血病，包括急性髓细胞白血病、骨髓造血干细胞 CD34、骨髓白血病和来自骨髓的 CD34。这首先是通过重新调整数据特征来实现的。然后，我们使用特征选择技术，从数据集的 22283 个特征中选取 25 个最重要的特征，再进一步将维度减少到 5 个特征，以降低计算复杂度。然后将这些特征输入费雪线性判别模块和基于似然法的分类指数。我们模型的整体性能非常出色。我们使用 2、4、5、6 和 7 个选定特征对结果进行了检验。2 个、5 个和 7 个选定特征的最佳分类准确率分别为 89.6%、96.92% 和 96.15%。我们的结果比先进水平高出约 4%，任务完成时间少于 100 毫秒。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Arabian Journal for Science and Engineering 综合性期刊-综合性期刊

CiteScore

5.20

自引率

3.40%

发文量

审稿时长

4.3 months

期刊介绍： King Fahd University of Petroleum & Minerals (KFUPM) partnered with Springer to publish the Arabian Journal for Science and Engineering (AJSE). AJSE, which has been published by KFUPM since 1975, is a recognized national, regional and international journal that provides a great opportunity for the dissemination of research advances from the Kingdom of Saudi Arabia, MENA and the world.