{"title":"Multiclass classification of leukemia cancer subtypes using gene expression data and Optimized Dueling Double Deep Q-network","authors":"R. Jayakrishnan, S. Meera","doi":"10.1016/j.chemolab.2025.105402","DOIUrl":null,"url":null,"abstract":"<div><div>Microarray technology aids in gene expression tracking, but diagnosing complex conditions like leukemia remains challenging due to multiple clinical factors. Deep reinforcement learning for cancer classification faces challenges related to optimization, handling high-dimensional noisy data, and interpretability. To address these limitations, this study proposes an Optimized Dueling Double Deep Q-Network (DDDQ-N) Framework, integrating advanced feature selection and DRL for robust leukemia subtype prediction. The framework begins with pre-processing, which includes data cleaning, normalization, and addressing class imbalance using the Synthetic Minority Over-sampling Technique (SMOTE). To enhance interpretability and reduce dimensionality, a novel Butterfly Optimization with Chaotic Local Search (BO-CLS) algorithm is introduced for feature selection, efficiently identifying the most discriminative genes. The selected features are then processed by a Dueling Double Deep Q-Network (DDQ-N), combining deep representation learning with reinforcement learning for sequential decision-making. The model employs a custom reward function and episode-based training to handle multi-class imbalance, adapt to tumor heterogeneity, and optimize classification strategies. Experimental results on a multi-class leukemia gene expression dataset demonstrate the model's superiority, achieving 99 % accuracy, 98.8 % precision, 99.2 % recall, and 99 % F1-score, outperforming existing methods such as Machine Learning (ML) Ensemble (94 %), Stacked Autoencoders with Grey Wolf Optimization (SAE-GWO) (98 %), and Feature Selective Neuro Evolution of Augmenting Topologies (FS-NEAT) (93 %). The proposed BO-CLS feature selection also shows significant improvements over ChisIG-SMOTE (95.5 % accuracy) and east Absolute Shrinkage and Selection Operator-Multi-Objective Genetic Algorithm (LASSO-MOGAT) (94.7 % accuracy), confirming its effectiveness in dimensionality reduction. These findings highlight the potential of the proposed framework to revolutionize leukemia diagnosis and provide a more efficient, interpretable, and accurate approach for clinical applications.</div></div>","PeriodicalId":9774,"journal":{"name":"Chemometrics and Intelligent Laboratory Systems","volume":"262 ","pages":"Article 105402"},"PeriodicalIF":3.7000,"publicationDate":"2025-04-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Chemometrics and Intelligent Laboratory Systems","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0169743925000875","RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"AUTOMATION & CONTROL SYSTEMS","Score":null,"Total":0}
引用次数: 0
Abstract
Microarray technology aids in gene expression tracking, but diagnosing complex conditions like leukemia remains challenging due to multiple clinical factors. Deep reinforcement learning for cancer classification faces challenges related to optimization, handling high-dimensional noisy data, and interpretability. To address these limitations, this study proposes an Optimized Dueling Double Deep Q-Network (DDDQ-N) Framework, integrating advanced feature selection and DRL for robust leukemia subtype prediction. The framework begins with pre-processing, which includes data cleaning, normalization, and addressing class imbalance using the Synthetic Minority Over-sampling Technique (SMOTE). To enhance interpretability and reduce dimensionality, a novel Butterfly Optimization with Chaotic Local Search (BO-CLS) algorithm is introduced for feature selection, efficiently identifying the most discriminative genes. The selected features are then processed by a Dueling Double Deep Q-Network (DDQ-N), combining deep representation learning with reinforcement learning for sequential decision-making. The model employs a custom reward function and episode-based training to handle multi-class imbalance, adapt to tumor heterogeneity, and optimize classification strategies. Experimental results on a multi-class leukemia gene expression dataset demonstrate the model's superiority, achieving 99 % accuracy, 98.8 % precision, 99.2 % recall, and 99 % F1-score, outperforming existing methods such as Machine Learning (ML) Ensemble (94 %), Stacked Autoencoders with Grey Wolf Optimization (SAE-GWO) (98 %), and Feature Selective Neuro Evolution of Augmenting Topologies (FS-NEAT) (93 %). The proposed BO-CLS feature selection also shows significant improvements over ChisIG-SMOTE (95.5 % accuracy) and east Absolute Shrinkage and Selection Operator-Multi-Objective Genetic Algorithm (LASSO-MOGAT) (94.7 % accuracy), confirming its effectiveness in dimensionality reduction. These findings highlight the potential of the proposed framework to revolutionize leukemia diagnosis and provide a more efficient, interpretable, and accurate approach for clinical applications.
期刊介绍:
Chemometrics and Intelligent Laboratory Systems publishes original research papers, short communications, reviews, tutorials and Original Software Publications reporting on development of novel statistical, mathematical, or computer techniques in Chemistry and related disciplines.
Chemometrics is the chemical discipline that uses mathematical and statistical methods to design or select optimal procedures and experiments, and to provide maximum chemical information by analysing chemical data.
The journal deals with the following topics:
1) Development of new statistical, mathematical and chemometrical methods for Chemistry and related fields (Environmental Chemistry, Biochemistry, Toxicology, System Biology, -Omics, etc.)
2) Novel applications of chemometrics to all branches of Chemistry and related fields (typical domains of interest are: process data analysis, experimental design, data mining, signal processing, supervised modelling, decision making, robust statistics, mixture analysis, multivariate calibration etc.) Routine applications of established chemometrical techniques will not be considered.
3) Development of new software that provides novel tools or truly advances the use of chemometrical methods.
4) Well characterized data sets to test performance for the new methods and software.
The journal complies with International Committee of Medical Journal Editors'' Uniform requirements for manuscripts.