A Machine Learning Approach to Reference Interval Estimation for Red Cell Parameters in a South and East Asian Population.

IF 4 2区 医学 Q1 MEDICAL LABORATORY TECHNOLOGY
Veera Sekaran Nadarajan,Pavai Sthaneshwar,Jia Qi Lim,Angeli Ambayya,Putri Junaidah Megat Yunus
{"title":"A Machine Learning Approach to Reference Interval Estimation for Red Cell Parameters in a South and East Asian Population.","authors":"Veera Sekaran Nadarajan,Pavai Sthaneshwar,Jia Qi Lim,Angeli Ambayya,Putri Junaidah Megat Yunus","doi":"10.3343/alm.2025.0027","DOIUrl":null,"url":null,"abstract":"Background\r\nIron deficiency (ID) and hemoglobinopathies are highly prevalent in Southeast Asia. Accurate estimation of reference intervals (RIs) for red cell parameters is complicated by the need to exclude individuals with these conditions from the reference population. Indirect RI estimations using machine learning could help overcome these challenges.\r\n\r\nMethods\r\nWe developed a binary classification model using eXtreme Gradient Boosting (XGB) to distinguish normal individuals from those with ID, hemoglobinopathies, or other anemias. The model was trained on an annotated dataset comprising 5,520 complete blood count (CBC) results and validated with a holdout dataset of 2,367 CBC results. An independent dataset of 64,100 CBC results was used to identify individuals predicted to be normal, from which RIs were estimated using the refineR algorithm.\r\n\r\nResults\r\nThe XGB model achieved an area under the ROC of 0.97 (95% confidence interval: 0.96-0.97) for distinguishing between individuals with normal versus abnormal values. Among individuals within the independent dataset, 40,300 (62.9%) were predicted to be normal. The refineR-based reference limits (RLs) derived from this subset approximated those obtained through a direct approach. Improvements in the accuracy of indirect RL estimates were most evident for hematocrit, hemoglobin, and red cell concentrations.\r\n\r\nConclusions\r\nCombining XGB with refineR to indirectly derive RIs for red cell parameters improved the accuracy and yielded results comparable with those of directly derived RIs. A further benefit was the capacity to generate sex- and age-specific ranges, which has remained difficult to achieve through direct approaches.","PeriodicalId":8421,"journal":{"name":"Annals of Laboratory Medicine","volume":"92 1","pages":""},"PeriodicalIF":4.0000,"publicationDate":"2025-07-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Annals of Laboratory Medicine","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.3343/alm.2025.0027","RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"MEDICAL LABORATORY TECHNOLOGY","Score":null,"Total":0}
引用次数: 0

Abstract

Background Iron deficiency (ID) and hemoglobinopathies are highly prevalent in Southeast Asia. Accurate estimation of reference intervals (RIs) for red cell parameters is complicated by the need to exclude individuals with these conditions from the reference population. Indirect RI estimations using machine learning could help overcome these challenges. Methods We developed a binary classification model using eXtreme Gradient Boosting (XGB) to distinguish normal individuals from those with ID, hemoglobinopathies, or other anemias. The model was trained on an annotated dataset comprising 5,520 complete blood count (CBC) results and validated with a holdout dataset of 2,367 CBC results. An independent dataset of 64,100 CBC results was used to identify individuals predicted to be normal, from which RIs were estimated using the refineR algorithm. Results The XGB model achieved an area under the ROC of 0.97 (95% confidence interval: 0.96-0.97) for distinguishing between individuals with normal versus abnormal values. Among individuals within the independent dataset, 40,300 (62.9%) were predicted to be normal. The refineR-based reference limits (RLs) derived from this subset approximated those obtained through a direct approach. Improvements in the accuracy of indirect RL estimates were most evident for hematocrit, hemoglobin, and red cell concentrations. Conclusions Combining XGB with refineR to indirectly derive RIs for red cell parameters improved the accuracy and yielded results comparable with those of directly derived RIs. A further benefit was the capacity to generate sex- and age-specific ranges, which has remained difficult to achieve through direct approaches.
南亚和东亚人群红细胞参数参考区间估计的机器学习方法。
背景:缺铁和血红蛋白病在东南亚非常普遍。由于需要从参考人群中排除具有这些条件的个体,红细胞参数的参考区间(RIs)的准确估计变得复杂。使用机器学习的间接RI估计可以帮助克服这些挑战。方法采用极限梯度增强(eXtreme Gradient Boosting, XGB)建立了一种二元分类模型,用于区分正常人群与ID、血红蛋白病或其他贫血患者。该模型在包含5,520个全血细胞计数(CBC)结果的注释数据集上进行训练,并使用包含2,367个CBC结果的保留数据集进行验证。使用一个包含64100个CBC结果的独立数据集来识别预测正常的个体,并使用refineR算法从中估计RIs。结果XGB模型的ROC下面积为0.97(95%可信区间:0.96-0.97),可以区分正常值和异常值的个体。在独立数据集中的个体中,预计有40,300人(62.9%)是正常的。从该子集派生的基于精炼厂的参考极限(RLs)近似于通过直接方法获得的参考极限。间接RL估计准确度的提高在红细胞压积、血红蛋白和红细胞浓度方面最为明显。结论结合XGB和refineR间接提取红细胞参数RIs可提高准确性,所得结果与直接提取RIs相当。另一个好处是能够产生按性别和年龄划分的范围,这一点仍然难以通过直接办法实现。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
Annals of Laboratory Medicine
Annals of Laboratory Medicine MEDICAL LABORATORY TECHNOLOGY-
CiteScore
8.30
自引率
12.20%
发文量
100
审稿时长
6-12 weeks
期刊介绍: Annals of Laboratory Medicine is the official journal of Korean Society for Laboratory Medicine. The journal title has been recently changed from the Korean Journal of Laboratory Medicine (ISSN, 1598-6535) from the January issue of 2012. The JCR 2017 Impact factor of Ann Lab Med was 1.916.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信