Development of an ensemble prediction model for acute graft-versus-host disease in allogeneic transplantation based on machine learning.

IF 3.8 3区 医学 Q2 MEDICAL INFORMATICS
Lin Song, Xingwei Wu, Mengjia Xu, Ling Xue, Xun Yu, Zongqi Cheng, Chenrong Huang, Liyan Miao
{"title":"Development of an ensemble prediction model for acute graft-versus-host disease in allogeneic transplantation based on machine learning.","authors":"Lin Song, Xingwei Wu, Mengjia Xu, Ling Xue, Xun Yu, Zongqi Cheng, Chenrong Huang, Liyan Miao","doi":"10.1186/s12911-025-03059-8","DOIUrl":null,"url":null,"abstract":"<p><strong>Background: </strong>Acute graft-versus-host disease (aGVHD) is a major post-transplantation complication and one of the most significant causes of non-relapse-related death. However, the massive and complex clinical data make aGVHD difficult to predict. Machine learning (ML), a branch of artificial intelligence, has since been introduced in medicine due to its ability to process complex, high-dimensional variables quickly and capture nonlinear relationships. However, the effects of immunosuppressants exposure was not considered in previous ML models. Thus, the purpose of this study was to develop and optimize models by Cox regression and machine learning algorithms to predict the risk of aGVHD in which cyclosporin A exposure and common clinical factors were included as variables.</p><p><strong>Methods: </strong>The data was preprocessed in the first step, and was randomly allocated at an 8:2 ratio. Cox regression model was constructed on the training set. Meanwhile, correlation analysis and recursive feature elimination were used for feature screening before machine learning model development. Then fifteen algorithms were used to establish models, and an ensemble model was established through soft voting based on the top five performance algorithms. Area under curve (AUC) was the main metric used to evaluate the model performance in the validation set, while nomogram and SHAP were applied to interpret the variables.</p><p><strong>Result: </strong>A total of 479 patients and 47 variables were included in the study. The incidence of grade II-IV aGVHD was 33.61%. The AUC of Cox regression model in the validation set was 0.625. In contrast, the new ensemble model has a better prediction ability (AUC = 0.776, Accuracy = 0.729, Precision = 0.667, Recall = 0.375, F1-score = 0.480). Except for the variables which were identified by previous studies, some rarely reported risk factors were found, such as quinolone, blood urea nitrogen and alkaline phosphatase.</p><p><strong>Conclusions: </strong>In summary, a new ensemble model with promising accuracy was established to predict grade II-IV classic aGVHD in allo-HSCT patients. It will help identify high-risk patients at an early stage and thus reduce the incidence of aGVHD.</p><p><strong>Clinical trial number: </strong>Not applicable.</p>","PeriodicalId":9340,"journal":{"name":"BMC Medical Informatics and Decision Making","volume":"25 1","pages":"234"},"PeriodicalIF":3.8000,"publicationDate":"2025-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12219984/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"BMC Medical Informatics and Decision Making","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1186/s12911-025-03059-8","RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"MEDICAL INFORMATICS","Score":null,"Total":0}
引用次数: 0

Abstract

Background: Acute graft-versus-host disease (aGVHD) is a major post-transplantation complication and one of the most significant causes of non-relapse-related death. However, the massive and complex clinical data make aGVHD difficult to predict. Machine learning (ML), a branch of artificial intelligence, has since been introduced in medicine due to its ability to process complex, high-dimensional variables quickly and capture nonlinear relationships. However, the effects of immunosuppressants exposure was not considered in previous ML models. Thus, the purpose of this study was to develop and optimize models by Cox regression and machine learning algorithms to predict the risk of aGVHD in which cyclosporin A exposure and common clinical factors were included as variables.

Methods: The data was preprocessed in the first step, and was randomly allocated at an 8:2 ratio. Cox regression model was constructed on the training set. Meanwhile, correlation analysis and recursive feature elimination were used for feature screening before machine learning model development. Then fifteen algorithms were used to establish models, and an ensemble model was established through soft voting based on the top five performance algorithms. Area under curve (AUC) was the main metric used to evaluate the model performance in the validation set, while nomogram and SHAP were applied to interpret the variables.

Result: A total of 479 patients and 47 variables were included in the study. The incidence of grade II-IV aGVHD was 33.61%. The AUC of Cox regression model in the validation set was 0.625. In contrast, the new ensemble model has a better prediction ability (AUC = 0.776, Accuracy = 0.729, Precision = 0.667, Recall = 0.375, F1-score = 0.480). Except for the variables which were identified by previous studies, some rarely reported risk factors were found, such as quinolone, blood urea nitrogen and alkaline phosphatase.

Conclusions: In summary, a new ensemble model with promising accuracy was established to predict grade II-IV classic aGVHD in allo-HSCT patients. It will help identify high-risk patients at an early stage and thus reduce the incidence of aGVHD.

Clinical trial number: Not applicable.

基于机器学习的同种异体移植急性移植物抗宿主病集成预测模型的建立。
背景:急性移植物抗宿主病(aGVHD)是移植后的主要并发症,也是导致非复发性死亡的最重要原因之一。然而,大量复杂的临床数据使得aGVHD难以预测。机器学习(ML)是人工智能的一个分支,由于能够快速处理复杂的高维变量并捕获非线性关系,因此已被引入医学领域。然而,在以前的ML模型中没有考虑免疫抑制剂暴露的影响。因此,本研究的目的是通过Cox回归和机器学习算法建立和优化模型,以环孢素A暴露和常见临床因素为变量,预测aGVHD的风险。方法:第一步对数据进行预处理,按8:2的比例随机分配。在训练集上建立Cox回归模型。同时,在机器学习模型开发之前,使用相关性分析和递归特征消去进行特征筛选。然后使用15种算法建立模型,并根据性能排名前5位的算法通过软投票建立集成模型。在验证集中,曲线下面积(AUC)是评估模型性能的主要指标,而nomogram和SHAP则用于解释变量。结果:共纳入479例患者和47个变量。II-IV级aGVHD发生率为33.61%。验证集中Cox回归模型的AUC为0.625。相比之下,新的集成模型具有更好的预测能力(AUC = 0.776,准确率= 0.729,精度= 0.667,召回率= 0.375,F1-score = 0.480)。除了前人研究确定的变量外,还发现了一些很少报道的危险因素,如喹诺酮类药物、血尿素氮、碱性磷酸酶等。结论:总之,我们建立了一种新的集合模型,具有较高的准确性,可用于预测同种异体移植患者II-IV级经典aGVHD。这将有助于在早期阶段识别高危患者,从而减少aGVHD的发病率。临床试验号:不适用。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
CiteScore
7.20
自引率
5.70%
发文量
297
审稿时长
1 months
期刊介绍: BMC Medical Informatics and Decision Making is an open access journal publishing original peer-reviewed research articles in relation to the design, development, implementation, use, and evaluation of health information technologies and decision-making for human health.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信