Mahwish Ilyas, Khalid Mahmood Aamir, Abdul Jaleel, Mohamed Deriche
{"title":"利用机器学习进行稳健类型预测的新型白血病基因特征提取和选择技术","authors":"Mahwish Ilyas, Khalid Mahmood Aamir, Abdul Jaleel, Mohamed Deriche","doi":"10.1007/s13369-024-09254-5","DOIUrl":null,"url":null,"abstract":"<p>The broad term ‘leukemia’ refers to different types of cancer related to blood cells. Detecting and identifying the specific type of leukemia continues to be a major challenge in the medical field. Diverse machine learning techniques can be vital in analyzing gene expression data from microarray experiments in cancer research related to leukemia. In particular, the Leukemia Gene Expression data from the Curated Microarray Database (CuMiDa) is used here. Microarrays can be challenging in determining expression patterns. In this work, we use Fisher’s linear discriminant analysis, a popular technique for dimensionality reduction, together with a new feature selection approach to predict leukemia using microarray data. Our machine learning model is used to predict five types of leukemia including AML, PBSC CD34, Bone Marrow, and CD34 from the bone marrow. This is achieved by first rescaling the data features. We then use a feature selection technique to obtain the 25 most significant features from the dataset’s 22,283 features, then further reduce the dimension to 5 features only, to reduce computational complexity. These features are then fed into a Fisher’s linear discriminant module and a likelihood-based index for classification. The overall performance of our model was excellent. We examine the results using 2, 4, 5, 6, and 7 selected features. The best classification accuracies are 89.6%, 96.92%, and 96.15%, for 2, 5, and 7 selected features, respectively. Our results outperform the state-of-the-art by about 4%, with an excellent task completion time of less than 100 ms.</p>","PeriodicalId":8109,"journal":{"name":"Arabian Journal for Science and Engineering","volume":"16 1","pages":""},"PeriodicalIF":2.9000,"publicationDate":"2024-07-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"A Novel Leukemia Gene Features Extraction and Selection Technique for Robust Type Prediction Using Machine Learning\",\"authors\":\"Mahwish Ilyas, Khalid Mahmood Aamir, Abdul Jaleel, Mohamed Deriche\",\"doi\":\"10.1007/s13369-024-09254-5\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p>The broad term ‘leukemia’ refers to different types of cancer related to blood cells. Detecting and identifying the specific type of leukemia continues to be a major challenge in the medical field. Diverse machine learning techniques can be vital in analyzing gene expression data from microarray experiments in cancer research related to leukemia. In particular, the Leukemia Gene Expression data from the Curated Microarray Database (CuMiDa) is used here. Microarrays can be challenging in determining expression patterns. In this work, we use Fisher’s linear discriminant analysis, a popular technique for dimensionality reduction, together with a new feature selection approach to predict leukemia using microarray data. Our machine learning model is used to predict five types of leukemia including AML, PBSC CD34, Bone Marrow, and CD34 from the bone marrow. This is achieved by first rescaling the data features. We then use a feature selection technique to obtain the 25 most significant features from the dataset’s 22,283 features, then further reduce the dimension to 5 features only, to reduce computational complexity. These features are then fed into a Fisher’s linear discriminant module and a likelihood-based index for classification. The overall performance of our model was excellent. We examine the results using 2, 4, 5, 6, and 7 selected features. The best classification accuracies are 89.6%, 96.92%, and 96.15%, for 2, 5, and 7 selected features, respectively. Our results outperform the state-of-the-art by about 4%, with an excellent task completion time of less than 100 ms.</p>\",\"PeriodicalId\":8109,\"journal\":{\"name\":\"Arabian Journal for Science and Engineering\",\"volume\":\"16 1\",\"pages\":\"\"},\"PeriodicalIF\":2.9000,\"publicationDate\":\"2024-07-02\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Arabian Journal for Science and Engineering\",\"FirstCategoryId\":\"103\",\"ListUrlMain\":\"https://doi.org/10.1007/s13369-024-09254-5\",\"RegionNum\":4,\"RegionCategory\":\"综合性期刊\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"Multidisciplinary\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Arabian Journal for Science and Engineering","FirstCategoryId":"103","ListUrlMain":"https://doi.org/10.1007/s13369-024-09254-5","RegionNum":4,"RegionCategory":"综合性期刊","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"Multidisciplinary","Score":null,"Total":0}
A Novel Leukemia Gene Features Extraction and Selection Technique for Robust Type Prediction Using Machine Learning
The broad term ‘leukemia’ refers to different types of cancer related to blood cells. Detecting and identifying the specific type of leukemia continues to be a major challenge in the medical field. Diverse machine learning techniques can be vital in analyzing gene expression data from microarray experiments in cancer research related to leukemia. In particular, the Leukemia Gene Expression data from the Curated Microarray Database (CuMiDa) is used here. Microarrays can be challenging in determining expression patterns. In this work, we use Fisher’s linear discriminant analysis, a popular technique for dimensionality reduction, together with a new feature selection approach to predict leukemia using microarray data. Our machine learning model is used to predict five types of leukemia including AML, PBSC CD34, Bone Marrow, and CD34 from the bone marrow. This is achieved by first rescaling the data features. We then use a feature selection technique to obtain the 25 most significant features from the dataset’s 22,283 features, then further reduce the dimension to 5 features only, to reduce computational complexity. These features are then fed into a Fisher’s linear discriminant module and a likelihood-based index for classification. The overall performance of our model was excellent. We examine the results using 2, 4, 5, 6, and 7 selected features. The best classification accuracies are 89.6%, 96.92%, and 96.15%, for 2, 5, and 7 selected features, respectively. Our results outperform the state-of-the-art by about 4%, with an excellent task completion time of less than 100 ms.
期刊介绍:
King Fahd University of Petroleum & Minerals (KFUPM) partnered with Springer to publish the Arabian Journal for Science and Engineering (AJSE).
AJSE, which has been published by KFUPM since 1975, is a recognized national, regional and international journal that provides a great opportunity for the dissemination of research advances from the Kingdom of Saudi Arabia, MENA and the world.