Jing Lv, Jinmi Li, Xiaodong Ren, Qing Huang, Shaoli Deng
{"title":"Machine Learning for Discriminating Microcytic Hypochromic Anemia Based on Erythrocyte Parameters.","authors":"Jing Lv, Jinmi Li, Xiaodong Ren, Qing Huang, Shaoli Deng","doi":"10.1111/ijlh.14524","DOIUrl":null,"url":null,"abstract":"<p><strong>Introduction: </strong>Thalassemia trait (TT) and iron deficiency anemia (IDA) are two common types of microcytic hypochromic anemia (MHA), but current diagnostic methods have limitations. This research sought to employ machine learning (ML) algorithms to identify MHA using erythrocyte parameters to distinguish between TT and IDA.</p><p><strong>Methods: </strong>One hundred and ninety-three subjects with MHA (98 TT and 95 IDA) were retrospectively analyzed. The cohort was randomized to training set (60%), validation set (20%) and test set (20%). Erythrocyte parameters were collected on an automated hematology analyzer (DxH800, Beckman Coulter), and five ML algorithms were selected to build discriminant models, including Random Forest, XGBoost, logistic regression, AdaBoost and LightGBM. In the assessment of discriminant performance of different models, indicators including sensitivity, specificity, accuracy, AUC, NPV, PPV, cutoff, F1 score and Kappa coefficients were utilized.</p><p><strong>Results: </strong>Among the five ML algorithms aforementioned, the Random Forest and logistic regression models presented excellent discriminant performance, outperforming other models in the testing set, with the AUC value, sensitivity, specificity, and ACC of 0.977, 0.928, 0.953, and 0.940 for Random Forest, and 0.978, 0.879, 0.979, 0.928 for logistic regression. Eight vital peripheral erythrocyte parameters were finally selected, including RBC, RDW, MCV, MCHC, RDWSD, HGB, MAF, and LHD.</p><p><strong>Conclusion: </strong>We successfully developed a discriminant model using ML algorithms based on erythrocyte parameters to identify MHA rapidly from TT or IDA, which may assist patients in taking preventive measures.</p>","PeriodicalId":94050,"journal":{"name":"International journal of laboratory hematology","volume":" ","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2025-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"International journal of laboratory hematology","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1111/ijlh.14524","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Introduction: Thalassemia trait (TT) and iron deficiency anemia (IDA) are two common types of microcytic hypochromic anemia (MHA), but current diagnostic methods have limitations. This research sought to employ machine learning (ML) algorithms to identify MHA using erythrocyte parameters to distinguish between TT and IDA.
Methods: One hundred and ninety-three subjects with MHA (98 TT and 95 IDA) were retrospectively analyzed. The cohort was randomized to training set (60%), validation set (20%) and test set (20%). Erythrocyte parameters were collected on an automated hematology analyzer (DxH800, Beckman Coulter), and five ML algorithms were selected to build discriminant models, including Random Forest, XGBoost, logistic regression, AdaBoost and LightGBM. In the assessment of discriminant performance of different models, indicators including sensitivity, specificity, accuracy, AUC, NPV, PPV, cutoff, F1 score and Kappa coefficients were utilized.
Results: Among the five ML algorithms aforementioned, the Random Forest and logistic regression models presented excellent discriminant performance, outperforming other models in the testing set, with the AUC value, sensitivity, specificity, and ACC of 0.977, 0.928, 0.953, and 0.940 for Random Forest, and 0.978, 0.879, 0.979, 0.928 for logistic regression. Eight vital peripheral erythrocyte parameters were finally selected, including RBC, RDW, MCV, MCHC, RDWSD, HGB, MAF, and LHD.
Conclusion: We successfully developed a discriminant model using ML algorithms based on erythrocyte parameters to identify MHA rapidly from TT or IDA, which may assist patients in taking preventive measures.