Sparsity regularization enhances gene selection and leukemia subtype classification via logistic regression

IF 2.1 4区医学 Q3 HEMATOLOGY

Leukemia research Pub Date : 2025-02-11 DOI:10.1016/j.leukres.2025.107663

Nozad Hussein Mahmood , Dler Hussein Kadir

{"title":"Sparsity regularization enhances gene selection and leukemia subtype classification via logistic regression","authors":"Nozad Hussein Mahmood , Dler Hussein Kadir","doi":"10.1016/j.leukres.2025.107663","DOIUrl":null,"url":null,"abstract":"<div><div>This study investigated the application of sparsity regularization methods to improve the classification of leukemia subtypes using high-dimensional gene expression data. Multinomial logistic regression models with the sparsity methods of Ridge, Lasso, and Elastic Net regularizations were employed to address overfitting and dimensionality issues while enhancing model interpretability. The study used a leukemia cancer dataset from the Curated Microarray Database (CuMiDa), which included gene expression data for 16,383 genes across 281 samples representing seven different types of leukemia. The statistical metrics of Accuracy, Kappa statistics, AUC, and F1-score were used to measure the models' implementation. Besides, the effectiveness and ability of each method in gene selection and dimensional reduction of the models were discussed. Elastic Net regularization was a better technique than the Ridge and Lasso based on overall classification performance; it also reached the highest accuracy along with Kappa values. On the other hand, both Lasso and Elastic Net were making more effective feature selections, creating sparse models that could efficiently discriminate leukemia subtypes. In this way, the results highlighted that the inclusion of sparsity regularization could enhance knowledge and accuracy in such a challenging task of subclass leukemia classification, enabling much more tailored treatments.</div></div>","PeriodicalId":18051,"journal":{"name":"Leukemia research","volume":"150 ","pages":"Article 107663"},"PeriodicalIF":2.1000,"publicationDate":"2025-02-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Leukemia research","FirstCategoryId":"3","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0145212625000232","RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"HEMATOLOGY","Score":null,"Total":0}

引用次数: 0

Abstract

This study investigated the application of sparsity regularization methods to improve the classification of leukemia subtypes using high-dimensional gene expression data. Multinomial logistic regression models with the sparsity methods of Ridge, Lasso, and Elastic Net regularizations were employed to address overfitting and dimensionality issues while enhancing model interpretability. The study used a leukemia cancer dataset from the Curated Microarray Database (CuMiDa), which included gene expression data for 16,383 genes across 281 samples representing seven different types of leukemia. The statistical metrics of Accuracy, Kappa statistics, AUC, and F1-score were used to measure the models' implementation. Besides, the effectiveness and ability of each method in gene selection and dimensional reduction of the models were discussed. Elastic Net regularization was a better technique than the Ridge and Lasso based on overall classification performance; it also reached the highest accuracy along with Kappa values. On the other hand, both Lasso and Elastic Net were making more effective feature selections, creating sparse models that could efficiently discriminate leukemia subtypes. In this way, the results highlighted that the inclusion of sparsity regularization could enhance knowledge and accuracy in such a challenging task of subclass leukemia classification, enabling much more tailored treatments.

查看原文本刊更多论文

稀疏正则化通过逻辑回归增强基因选择和白血病亚型分类能力

本研究利用高维基因表达数据研究了稀疏正则化方法在白血病亚型分类中的应用。采用Ridge， Lasso和Elastic Net正则化的稀疏性方法的多项逻辑回归模型来解决过拟合和维数问题，同时增强模型的可解释性。该研究使用了来自策展微阵列数据库（CuMiDa）的白血病癌症数据集，其中包括代表七种不同类型白血病的281个样本的16,383个基因的基因表达数据。采用准确性、Kappa统计量、AUC和f1评分等统计指标来衡量模型的实施情况。此外，还讨论了每种方法在基因选择和模型降维方面的有效性和能力。综合分类性能，弹性网正则化优于Ridge和Lasso；它也达到了最高的精度与卡帕值。另一方面，Lasso和Elastic Net都在进行更有效的特征选择，创建能够有效区分白血病亚型的稀疏模型。通过这种方式，结果强调了包含稀疏性正则化可以增强对亚类白血病分类的知识和准确性，从而实现更有针对性的治疗。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Leukemia research 医学-血液学

CiteScore

4.00

自引率

3.70%

发文量

259

审稿时长

1 months

期刊介绍： Leukemia Research an international journal which brings comprehensive and current information to all health care professionals involved in basic and applied clinical research in hematological malignancies. The editors encourage the submission of articles relevant to hematological malignancies. The Journal scope includes reporting studies of cellular and molecular biology, genetics, immunology, epidemiology, clinical evaluation, and therapy of these diseases.