Detection of β-Thalassemia trait from a heterogeneous population with red cell indices and parameters

IF 7 2区 医学 Q1 BIOLOGY
Subrata Saha , Prashant Sharma , Atul Kumar Jain , Bapi Dutta , Luis Martínez , Sarkaft Saleh , Tuphan Kanti Dolai , Anilava Kaviraj , Tanmay Sanyal , Izabela Nielsen , Reena Das
{"title":"Detection of β-Thalassemia trait from a heterogeneous population with red cell indices and parameters","authors":"Subrata Saha ,&nbsp;Prashant Sharma ,&nbsp;Atul Kumar Jain ,&nbsp;Bapi Dutta ,&nbsp;Luis Martínez ,&nbsp;Sarkaft Saleh ,&nbsp;Tuphan Kanti Dolai ,&nbsp;Anilava Kaviraj ,&nbsp;Tanmay Sanyal ,&nbsp;Izabela Nielsen ,&nbsp;Reena Das","doi":"10.1016/j.compbiomed.2025.110151","DOIUrl":null,"url":null,"abstract":"<div><h3>Background:</h3><div>India is home to about 42 million people with <span><math><mi>β</mi></math></span>-thalassemia trait (<span><math><mi>β</mi></math></span>TT) necessitating screening of <span><math><mi>β</mi></math></span>TT to stop spread of the disease. Over the years, researchers developed discrimination formulae based on red blood cell (RBC) parameters to screen <span><math><mi>β</mi></math></span>-thalassemia trait from iron deficiency anemia (IDA). However, the screening programs often encounter normal subjects (NSs) with other hemoglobinopathy variants. Because the outcome of existing formulas is binary, they often club normal subjects (NS) or variants such as Hemoglobin E (HbE) traits with either <span><math><mi>β</mi></math></span>TT or IDA. Therefore, it is necessary to segregate <span><math><mi>β</mi></math></span>TT, IDA, HbE, and NS in mixed population data for rational screening.</div></div><div><h3>Methods:</h3><div>A test data of 2877 subjects with 1226 NS, 425 HbE, 223 IDA, and 1003 <span><math><mi>β</mi></math></span>TT were collected from the Postgraduate Institute of Medical Education and Research (PGIMER), Chandigarh, India and NRS Medical College and Hospital, Kolkata, India. First, we evaluated the performance of 25 discrimination formulae and four machine learning algorithms (MLA), Multi-Layer Perceptron (MLP), Neighborhood Components Analysis (NCA), eXtreme Gradient Boosting Classifier (XGBC), and SKope-Rules (SKR) based on seven performance measures. Based on the performance measures, we selected four discrimination formulae and two MLAs for further evaluation. The SHapley Additive exPlanations (SHAP) model was employed to explore the interpretability of outcomes. We generated four rules using the SKR algorithm to discriminate variants of hemoglobinopathies. Finally, a step-wise implementation scheme for screening is proposed.</div></div><div><h3>Results:</h3><div>Results demonstrate that a single formula cannot ensure high performance for all the performance measures. When tested on data set containing <span><math><mi>β</mi></math></span>TT and IDA samples, the best-performing formulae appear as SCS<span><math><msub><mrow></mrow><mrow><mi>β</mi><mi>T</mi><mi>T</mi></mrow></msub></math></span> in terms of sensitivity (SE) and negative predictive value (NPV); Sirachainan in terms of specificity (SP) and positive predictive value (PPV); CRUISE in terms of Youden index (YI) and RF-4 in terms of Matthews correlation coefficient (MCC) and <span><math><mi>κ</mi></math></span>-coefficient, respectively. Among MLAs, the best-performing algorithms are Skope-rule regarding SP, YI, PPV, and XGBC in the rest of the measures. When tested on a heterogeneous data set, MCC and <span><math><mi>κ</mi></math></span>-coefficient for these four formulae are decreased, but the performance of the two MLAs remains steady. The proposed scheme demonstrates around 97.33–97.62% accuracy while applied to two validation data sets collected from different sources.</div></div><div><h3>Conclusion:</h3><div>The performances of XGBC and SKR algorithms for multi-class classification remain steady while segregating different variants of hemoglobinopathies. The developed rules may be helpful for pre-screening individuals and a possible solution for screening in a mixed population with multiple variants for sustainable, cost-effective, and resource-saving screening.</div></div>","PeriodicalId":10578,"journal":{"name":"Computers in biology and medicine","volume":"192 ","pages":"Article 110151"},"PeriodicalIF":7.0000,"publicationDate":"2025-04-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Computers in biology and medicine","FirstCategoryId":"5","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0010482525005025","RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"BIOLOGY","Score":null,"Total":0}
引用次数: 0

Abstract

Background:

India is home to about 42 million people with β-thalassemia trait (βTT) necessitating screening of βTT to stop spread of the disease. Over the years, researchers developed discrimination formulae based on red blood cell (RBC) parameters to screen β-thalassemia trait from iron deficiency anemia (IDA). However, the screening programs often encounter normal subjects (NSs) with other hemoglobinopathy variants. Because the outcome of existing formulas is binary, they often club normal subjects (NS) or variants such as Hemoglobin E (HbE) traits with either βTT or IDA. Therefore, it is necessary to segregate βTT, IDA, HbE, and NS in mixed population data for rational screening.

Methods:

A test data of 2877 subjects with 1226 NS, 425 HbE, 223 IDA, and 1003 βTT were collected from the Postgraduate Institute of Medical Education and Research (PGIMER), Chandigarh, India and NRS Medical College and Hospital, Kolkata, India. First, we evaluated the performance of 25 discrimination formulae and four machine learning algorithms (MLA), Multi-Layer Perceptron (MLP), Neighborhood Components Analysis (NCA), eXtreme Gradient Boosting Classifier (XGBC), and SKope-Rules (SKR) based on seven performance measures. Based on the performance measures, we selected four discrimination formulae and two MLAs for further evaluation. The SHapley Additive exPlanations (SHAP) model was employed to explore the interpretability of outcomes. We generated four rules using the SKR algorithm to discriminate variants of hemoglobinopathies. Finally, a step-wise implementation scheme for screening is proposed.

Results:

Results demonstrate that a single formula cannot ensure high performance for all the performance measures. When tested on data set containing βTT and IDA samples, the best-performing formulae appear as SCSβTT in terms of sensitivity (SE) and negative predictive value (NPV); Sirachainan in terms of specificity (SP) and positive predictive value (PPV); CRUISE in terms of Youden index (YI) and RF-4 in terms of Matthews correlation coefficient (MCC) and κ-coefficient, respectively. Among MLAs, the best-performing algorithms are Skope-rule regarding SP, YI, PPV, and XGBC in the rest of the measures. When tested on a heterogeneous data set, MCC and κ-coefficient for these four formulae are decreased, but the performance of the two MLAs remains steady. The proposed scheme demonstrates around 97.33–97.62% accuracy while applied to two validation data sets collected from different sources.

Conclusion:

The performances of XGBC and SKR algorithms for multi-class classification remain steady while segregating different variants of hemoglobinopathies. The developed rules may be helpful for pre-screening individuals and a possible solution for screening in a mixed population with multiple variants for sustainable, cost-effective, and resource-saving screening.
用红细胞指标和参数检测异种人群β-地中海贫血性状
背景:印度约有4200万人患有β-地中海贫血特征(βTT),需要对βTT进行筛查以阻止该疾病的传播。多年来,研究人员开发了基于红细胞(RBC)参数的鉴别公式来筛选缺铁性贫血(IDA)中的β-地中海贫血性状。然而,筛查程序经常遇到正常受试者(NSs)与其他血红蛋白病变体。由于现有配方的结果是二元的,它们经常将正常受试者(NS)或血红蛋白E (HbE)等变异性状与βTT或IDA结合起来。因此,有必要在混合人群数据中分离βTT、IDA、HbE和NS,进行合理筛选。方法:从印度昌迪加尔医学教育与研究研究生院(PGIMER)和印度加尔各答NRS医学院和医院收集1226例NS、425例HbE、223例IDA和1003例βTT的2877例试验资料。首先,我们评估了25个判别公式和4种机器学习算法(MLA)、多层感知器(MLP)、邻域成分分析(NCA)、极端梯度增强分类器(XGBC)和基于7个性能指标的范围规则(SKR)的性能。基于性能指标,我们选择了4个判别公式和2个mla进行进一步评价。采用SHapley加性解释(SHAP)模型探讨结果的可解释性。我们使用SKR算法生成了四条规则来区分血红蛋白病的变体。最后,提出了分步筛选的实施方案。结果:结果表明,单一公式不能保证所有性能指标的高性能。当在包含βTT和IDA样本的数据集上进行测试时,在敏感性(SE)和负预测值(NPV)方面表现最好的公式是scs - βTT;特异性(SP)和阳性预测值(PPV);CRUISE表示约登指数(YI), RF-4表示马修斯相关系数(MCC)和κ-系数。在mla中,其余度量中表现最好的算法是SP、YI、PPV和XGBC的目的性规则。在异构数据集上进行测试时,这四个公式的MCC和κ-系数都有所下降,但两个MLAs的性能保持稳定。在不同来源的两个验证数据集上,该方案的准确率约为97.33-97.62%。结论:XGBC和SKR算法在分离不同类型的血红蛋白病时表现稳定。制定的规则可能有助于个体的预筛查,并可能为具有多种变异的混合人群的筛查提供可持续,成本效益和资源节约的解决方案。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
Computers in biology and medicine
Computers in biology and medicine 工程技术-工程:生物医学
CiteScore
11.70
自引率
10.40%
发文量
1086
审稿时长
74 days
期刊介绍: Computers in Biology and Medicine is an international forum for sharing groundbreaking advancements in the use of computers in bioscience and medicine. This journal serves as a medium for communicating essential research, instruction, ideas, and information regarding the rapidly evolving field of computer applications in these domains. By encouraging the exchange of knowledge, we aim to facilitate progress and innovation in the utilization of computers in biology and medicine.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信