Machine-Learning Predictive Tool for the Individualized Prediction of Outcomes of Hematopoietic Cell Transplantation for Sickle Cell Disease: Registry-Based Study.

IF 2
JMIR AI Pub Date : 2025-09-15 DOI:10.2196/64519
Rajagopal Subramaniam Chandrasekar, Michael Kane, Lakshmanan Krishnamurti
{"title":"Machine-Learning Predictive Tool for the Individualized Prediction of Outcomes of Hematopoietic Cell Transplantation for Sickle Cell Disease: Registry-Based Study.","authors":"Rajagopal Subramaniam Chandrasekar, Michael Kane, Lakshmanan Krishnamurti","doi":"10.2196/64519","DOIUrl":null,"url":null,"abstract":"<p><strong>Background: </strong>Disease-modifying therapies ameliorate disease severity of sickle cell disease (SCD), but hematopoietic cell transplantation (HCT), and more recently, autologous gene therapy are the only treatments that have curative potential for SCD. While registry-based studies provide population-level estimates, they do not address the uncertainty regarding individual outcomes of HCT. Computational machine learning (ML) has the potential to identify generalizable predictive patterns and quantify uncertainty in estimates, thereby improving clinical decision-making. There is no existing ML model for SCD, and ML models for HCT for other diseases focus on single outcomes rather than all relevant outcomes.</p><p><strong>Objective: </strong>This study aims to address the existing knowledge gap by developing and validating an individualized ML prediction model SPRIGHT (Sickle Cell Predicting Outcomes of Hematopoietic Cell Transplantation), incorporating multiple relevant pre-HCT features to make predictions of key post-HCT clinical outcomes.</p><p><strong>Methods: </strong>We applied a supervised random forest ML model to clinical parameters in a deidentified Center for International Blood and Marrow Transplant Research (CIBMTR) dataset of 1641 patients who underwent HCT between 1991 and 2021 and were followed for a median of 42.5 (IQR 52.5;range 0.3-312.9) months. We applied forward and reverse feature selection methods to optimize a set of predictive variables. To counter the imbalance bias toward predicting positive outcomes due to the small number of negative outcomes, we constructed a training dataset, taking each outcome as variable of interest, and performed 2-times repeated 10-fold cross-validation. SPRIGHT is a web-based individualized prediction tool accessible by smartphone, tablet, or personal computer. It incorporates predictive variables of age, age group, Karnofsky or Lansky score, comorbidity index, recipient cytomegalovirus seropositivity, history of acute chest syndrome, need for exchange transfusion, occurrence and frequency of vaso-occlusive crisis (VOC) before HCT, and either a published or custom chemotherapy or radiation conditioning, serotherapy, and graft-versus-host disease prophylaxis. SPRIGHT makes individualized predictions of overall survival (OS), event-free survival, graft failure, acute graft-versus-host disease (AGVHD), chronic graft-versus-host disease (CGVHD), and occurrence of VOC or stroke post-HCT.</p><p><strong>Results: </strong>The model's ability to distinguish between positive and negative classes, that is, discrimination, was evaluated using the area under the curve, accuracy, and balanced accuracy. Discrimination met or exceeded published predictive benchmarks with area under the curve for OS (0.7925), event-free survival (0.7900), graft failure (0.8024), acute graft-versus-host disease (0.6793), chronic graft-versus-host disease (0.7320), and VOC post-HCT (0.8779). SPRIGHT revealed good calibration with a slope of 0.87-0.96, with small negative intercepts (-0.01 to 0.03), for 4 out of the 5 outcomes. However, OS exhibits nonideal calibration, which may be reflective of the overall high OS in all subgroups.</p><p><strong>Conclusions: </strong>A web-based ML prediction tool incorporating multiple clinically relevant variables predicts key clinical outcomes with a high level of discrimination and calibration and has potential in shared decision-making.</p>","PeriodicalId":73551,"journal":{"name":"JMIR AI","volume":"4 ","pages":"e64519"},"PeriodicalIF":2.0000,"publicationDate":"2025-09-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12435087/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"JMIR AI","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.2196/64519","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

Background: Disease-modifying therapies ameliorate disease severity of sickle cell disease (SCD), but hematopoietic cell transplantation (HCT), and more recently, autologous gene therapy are the only treatments that have curative potential for SCD. While registry-based studies provide population-level estimates, they do not address the uncertainty regarding individual outcomes of HCT. Computational machine learning (ML) has the potential to identify generalizable predictive patterns and quantify uncertainty in estimates, thereby improving clinical decision-making. There is no existing ML model for SCD, and ML models for HCT for other diseases focus on single outcomes rather than all relevant outcomes.

Objective: This study aims to address the existing knowledge gap by developing and validating an individualized ML prediction model SPRIGHT (Sickle Cell Predicting Outcomes of Hematopoietic Cell Transplantation), incorporating multiple relevant pre-HCT features to make predictions of key post-HCT clinical outcomes.

Methods: We applied a supervised random forest ML model to clinical parameters in a deidentified Center for International Blood and Marrow Transplant Research (CIBMTR) dataset of 1641 patients who underwent HCT between 1991 and 2021 and were followed for a median of 42.5 (IQR 52.5;range 0.3-312.9) months. We applied forward and reverse feature selection methods to optimize a set of predictive variables. To counter the imbalance bias toward predicting positive outcomes due to the small number of negative outcomes, we constructed a training dataset, taking each outcome as variable of interest, and performed 2-times repeated 10-fold cross-validation. SPRIGHT is a web-based individualized prediction tool accessible by smartphone, tablet, or personal computer. It incorporates predictive variables of age, age group, Karnofsky or Lansky score, comorbidity index, recipient cytomegalovirus seropositivity, history of acute chest syndrome, need for exchange transfusion, occurrence and frequency of vaso-occlusive crisis (VOC) before HCT, and either a published or custom chemotherapy or radiation conditioning, serotherapy, and graft-versus-host disease prophylaxis. SPRIGHT makes individualized predictions of overall survival (OS), event-free survival, graft failure, acute graft-versus-host disease (AGVHD), chronic graft-versus-host disease (CGVHD), and occurrence of VOC or stroke post-HCT.

Results: The model's ability to distinguish between positive and negative classes, that is, discrimination, was evaluated using the area under the curve, accuracy, and balanced accuracy. Discrimination met or exceeded published predictive benchmarks with area under the curve for OS (0.7925), event-free survival (0.7900), graft failure (0.8024), acute graft-versus-host disease (0.6793), chronic graft-versus-host disease (0.7320), and VOC post-HCT (0.8779). SPRIGHT revealed good calibration with a slope of 0.87-0.96, with small negative intercepts (-0.01 to 0.03), for 4 out of the 5 outcomes. However, OS exhibits nonideal calibration, which may be reflective of the overall high OS in all subgroups.

Conclusions: A web-based ML prediction tool incorporating multiple clinically relevant variables predicts key clinical outcomes with a high level of discrimination and calibration and has potential in shared decision-making.

用于个体化预测镰状细胞病造血细胞移植结果的机器学习预测工具:基于登记的研究。
背景:疾病修饰疗法可以改善镰状细胞病(SCD)的疾病严重程度,但造血细胞移植(HCT)和最近的自体基因治疗是唯一具有治疗SCD潜力的治疗方法。虽然基于登记的研究提供了人口水平的估计,但它们并没有解决关于HCT个体结果的不确定性。计算机器学习(ML)具有识别可推广的预测模式和量化估计中的不确定性的潜力,从而改善临床决策。目前还没有针对SCD的ML模型,其他疾病的HCT ML模型关注的是单一结果,而不是所有相关结果。目的:本研究旨在通过开发和验证个体化ML预测模型SPRIGHT(镰状细胞预测造血细胞移植结果)来解决现有的知识空白,该模型结合hct前的多个相关特征来预测hct后的关键临床结果。方法:我们将监督随机森林ML模型应用于国际血液和骨髓移植研究中心(CIBMTR)数据集的临床参数,该数据集包括1641名在1991年至2021年间接受HCT的患者,随访时间中位数为42.5个月(IQR为52.5;范围为0.3-312.9)。我们应用正向和反向特征选择方法来优化一组预测变量。为了消除由于负面结果较少而导致预测正面结果的不平衡偏差,我们构建了一个训练数据集,将每个结果作为感兴趣的变量,并进行了2次重复的10次交叉验证。sprright是一个基于网络的个性化预测工具,可以通过智能手机、平板电脑或个人电脑访问。它包括年龄、年龄组、Karnofsky或Lansky评分、合并症指数、受体巨细胞病毒血清阳性、急性胸综合征史、换血需求、HCT前血管闭塞危像(VOC)的发生和频率、公开的或定制的化疗或放疗、血清治疗和移植物抗宿主病预防等预测变量。SPRIGHT可以个性化预测总生存期(OS)、无事件生存期、移植物衰竭、急性移植物抗宿主病(AGVHD)、慢性移植物抗宿主病(CGVHD)以及hct后VOC或卒中的发生。结果:模型区分正负类的能力,即判别,用曲线下面积、精度和平衡精度来评估。识别达到或超过公布的预测基准,曲线下面积为OS(0.7925)、无事件生存(0.7900)、移植物失败(0.8024)、急性移植物抗宿主病(0.6793)、慢性移植物抗宿主病(0.7320)和hct后VOC(0.8779)。sprright显示出良好的校准,斜率为0.87-0.96,负截距较小(-0.01至0.03)。然而,OS表现出非理想校准,这可能反映了所有亚组的总体高OS。结论:基于网络的机器学习预测工具包含多个临床相关变量,预测关键的临床结果具有高水平的区分和校准,并具有共同决策的潜力。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信