Comparison of 7 artificial intelligence models in predicting venous thromboembolism in COVID-19 patients

IF 3.4 3区 医学 Q2 HEMATOLOGY
Indika Rajakaruna , Mohammad Hossein Amirhosseini , Mike Makris , Mike Laffan , Yang Li , Deepa J. Arachchillage
{"title":"Comparison of 7 artificial intelligence models in predicting venous thromboembolism in COVID-19 patients","authors":"Indika Rajakaruna ,&nbsp;Mohammad Hossein Amirhosseini ,&nbsp;Mike Makris ,&nbsp;Mike Laffan ,&nbsp;Yang Li ,&nbsp;Deepa J. Arachchillage","doi":"10.1016/j.rpth.2025.102711","DOIUrl":null,"url":null,"abstract":"<div><h3>Background</h3><div>An artificial intelligence (AI) approach can be used to predict venous thromboembolism (VTE).</div></div><div><h3>Objectives</h3><div>To compare different AI models in predicting VTE using data from patients with COVID-19.</div></div><div><h3>Methods</h3><div>We used feature ranking through recursive feature elimination with AI algorithms (logistic regression and random forest classifier) and standard statistical methods to identify the significant factors that contribute to developing VTE in COVID-19 patients using a large dataset from “Coagulopathy associated with COVID-19,” a multicenter observational study. We developed 7 AI models (Multilayer perceptron classifier, Artificial neural network with backpropagation, eXtreme gradient boosting, Support vector classifier, Stochastic gradient descent classifier, Random forest classifier and Logistic regression classifier) using the selected significant features to predict the development of VTE during hospitalization and used K-fold cross-validation and hyperparameter tuning to validate and optimize the models. The models’ predictive power was tested on 2649 (33% of 8027 overall patients), which were previously separated and not used during model training and validation stages.</div></div><div><h3>Results</h3><div>Age, female sex, white ethnicity, comorbidities (diabetes, liver disease, autoimmune disease), and laboratory features (increased hemoglobin, white cell count, D-dimer, lactate dehydrogenase, ferritin), and presence of multiorgan failure were major factors associated with the development of thrombosis. Support vector classifier (SVC) model outperformed all other models, achieving an accuracy of 97%. The SVC model also led in precision (0.98), recall (0.97), and F1 score (0.97), and recorded the lowest log-loss score (0.112 on the test dataset), reflecting better model convergence and an improved fit to the data. Additionally, it achieved the highest area under the curve score (0.983).</div></div><div><h3>Conclusion</h3><div>The SVC model delivered the best overall performance outperforming similar studies that developed deep learning and machine-learning models for COVID-19.</div></div>","PeriodicalId":20893,"journal":{"name":"Research and Practice in Thrombosis and Haemostasis","volume":"9 2","pages":"Article 102711"},"PeriodicalIF":3.4000,"publicationDate":"2025-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Research and Practice in Thrombosis and Haemostasis","FirstCategoryId":"3","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S2475037925000354","RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"HEMATOLOGY","Score":null,"Total":0}
引用次数: 0

Abstract

Background

An artificial intelligence (AI) approach can be used to predict venous thromboembolism (VTE).

Objectives

To compare different AI models in predicting VTE using data from patients with COVID-19.

Methods

We used feature ranking through recursive feature elimination with AI algorithms (logistic regression and random forest classifier) and standard statistical methods to identify the significant factors that contribute to developing VTE in COVID-19 patients using a large dataset from “Coagulopathy associated with COVID-19,” a multicenter observational study. We developed 7 AI models (Multilayer perceptron classifier, Artificial neural network with backpropagation, eXtreme gradient boosting, Support vector classifier, Stochastic gradient descent classifier, Random forest classifier and Logistic regression classifier) using the selected significant features to predict the development of VTE during hospitalization and used K-fold cross-validation and hyperparameter tuning to validate and optimize the models. The models’ predictive power was tested on 2649 (33% of 8027 overall patients), which were previously separated and not used during model training and validation stages.

Results

Age, female sex, white ethnicity, comorbidities (diabetes, liver disease, autoimmune disease), and laboratory features (increased hemoglobin, white cell count, D-dimer, lactate dehydrogenase, ferritin), and presence of multiorgan failure were major factors associated with the development of thrombosis. Support vector classifier (SVC) model outperformed all other models, achieving an accuracy of 97%. The SVC model also led in precision (0.98), recall (0.97), and F1 score (0.97), and recorded the lowest log-loss score (0.112 on the test dataset), reflecting better model convergence and an improved fit to the data. Additionally, it achieved the highest area under the curve score (0.983).

Conclusion

The SVC model delivered the best overall performance outperforming similar studies that developed deep learning and machine-learning models for COVID-19.
7种人工智能模型预测COVID-19患者静脉血栓栓塞的比较
人工智能(AI)方法可用于预测静脉血栓栓塞(VTE)。目的比较不同AI模型对COVID-19患者静脉血栓栓塞的预测效果。方法我们使用人工智能算法(逻辑回归和随机森林分类器)和标准统计方法,通过递归特征消除进行特征排序,利用来自“COVID-19相关凝血功能障碍”的大型数据集,确定导致COVID-19患者发生静脉血栓栓塞的重要因素。我们开发了7个AI模型(多层感知器分类器、反向传播人工神经网络、极端梯度增强、支持向量分类器、随机梯度下降分类器、随机森林分类器和逻辑回归分类器),利用所选择的显著特征预测住院期间VTE的发展,并使用K-fold交叉验证和超参数调优对模型进行验证和优化。模型的预测能力在2649例(8027例总患者中的33%)上进行了测试,这些患者之前是分开的,在模型训练和验证阶段没有使用。结果年龄、女性、白人、合并症(糖尿病、肝病、自身免疫性疾病)、实验室特征(血红蛋白升高、白细胞计数、d -二聚体、乳酸脱氢酶、铁蛋白)和多器官功能衰竭是与血栓形成相关的主要因素。支持向量分类器(SVC)模型优于所有其他模型,达到97%的准确率。SVC模型在精度(0.98)、召回率(0.97)和F1分数(0.97)方面也处于领先地位,并且在测试数据集上记录了最低的对数损失分数(0.112),反映了更好的模型收敛性和对数据的改进拟合。曲线下面积得分最高,为0.983。结论SVC模型的整体性能优于开发COVID-19深度学习和机器学习模型的类似研究。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
CiteScore
5.60
自引率
13.00%
发文量
212
审稿时长
7 weeks
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信