{"title":"A Machine Learning Approach to Reference Interval Estimation for Red Cell Parameters in a South and East Asian Population.","authors":"Veera Sekaran Nadarajan,Pavai Sthaneshwar,Jia Qi Lim,Angeli Ambayya,Putri Junaidah Megat Yunus","doi":"10.3343/alm.2025.0027","DOIUrl":null,"url":null,"abstract":"Background\r\nIron deficiency (ID) and hemoglobinopathies are highly prevalent in Southeast Asia. Accurate estimation of reference intervals (RIs) for red cell parameters is complicated by the need to exclude individuals with these conditions from the reference population. Indirect RI estimations using machine learning could help overcome these challenges.\r\n\r\nMethods\r\nWe developed a binary classification model using eXtreme Gradient Boosting (XGB) to distinguish normal individuals from those with ID, hemoglobinopathies, or other anemias. The model was trained on an annotated dataset comprising 5,520 complete blood count (CBC) results and validated with a holdout dataset of 2,367 CBC results. An independent dataset of 64,100 CBC results was used to identify individuals predicted to be normal, from which RIs were estimated using the refineR algorithm.\r\n\r\nResults\r\nThe XGB model achieved an area under the ROC of 0.97 (95% confidence interval: 0.96-0.97) for distinguishing between individuals with normal versus abnormal values. Among individuals within the independent dataset, 40,300 (62.9%) were predicted to be normal. The refineR-based reference limits (RLs) derived from this subset approximated those obtained through a direct approach. Improvements in the accuracy of indirect RL estimates were most evident for hematocrit, hemoglobin, and red cell concentrations.\r\n\r\nConclusions\r\nCombining XGB with refineR to indirectly derive RIs for red cell parameters improved the accuracy and yielded results comparable with those of directly derived RIs. A further benefit was the capacity to generate sex- and age-specific ranges, which has remained difficult to achieve through direct approaches.","PeriodicalId":8421,"journal":{"name":"Annals of Laboratory Medicine","volume":"92 1","pages":""},"PeriodicalIF":4.0000,"publicationDate":"2025-07-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Annals of Laboratory Medicine","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.3343/alm.2025.0027","RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"MEDICAL LABORATORY TECHNOLOGY","Score":null,"Total":0}
引用次数: 0
Abstract
Background
Iron deficiency (ID) and hemoglobinopathies are highly prevalent in Southeast Asia. Accurate estimation of reference intervals (RIs) for red cell parameters is complicated by the need to exclude individuals with these conditions from the reference population. Indirect RI estimations using machine learning could help overcome these challenges.
Methods
We developed a binary classification model using eXtreme Gradient Boosting (XGB) to distinguish normal individuals from those with ID, hemoglobinopathies, or other anemias. The model was trained on an annotated dataset comprising 5,520 complete blood count (CBC) results and validated with a holdout dataset of 2,367 CBC results. An independent dataset of 64,100 CBC results was used to identify individuals predicted to be normal, from which RIs were estimated using the refineR algorithm.
Results
The XGB model achieved an area under the ROC of 0.97 (95% confidence interval: 0.96-0.97) for distinguishing between individuals with normal versus abnormal values. Among individuals within the independent dataset, 40,300 (62.9%) were predicted to be normal. The refineR-based reference limits (RLs) derived from this subset approximated those obtained through a direct approach. Improvements in the accuracy of indirect RL estimates were most evident for hematocrit, hemoglobin, and red cell concentrations.
Conclusions
Combining XGB with refineR to indirectly derive RIs for red cell parameters improved the accuracy and yielded results comparable with those of directly derived RIs. A further benefit was the capacity to generate sex- and age-specific ranges, which has remained difficult to achieve through direct approaches.
期刊介绍:
Annals of Laboratory Medicine is the official journal of Korean Society for Laboratory Medicine. The journal title has been recently changed from the Korean Journal of Laboratory Medicine (ISSN, 1598-6535) from the January issue of 2012. The JCR 2017 Impact factor of Ann Lab Med was 1.916.