Indika Rajakaruna , Mohammad Hossein Amirhosseini , Mike Makris , Mike Laffan , Yang Li , Deepa J. Arachchillage
{"title":"Comparison of 7 artificial intelligence models in predicting venous thromboembolism in COVID-19 patients","authors":"Indika Rajakaruna , Mohammad Hossein Amirhosseini , Mike Makris , Mike Laffan , Yang Li , Deepa J. Arachchillage","doi":"10.1016/j.rpth.2025.102711","DOIUrl":null,"url":null,"abstract":"<div><h3>Background</h3><div>An artificial intelligence (AI) approach can be used to predict venous thromboembolism (VTE).</div></div><div><h3>Objectives</h3><div>To compare different AI models in predicting VTE using data from patients with COVID-19.</div></div><div><h3>Methods</h3><div>We used feature ranking through recursive feature elimination with AI algorithms (logistic regression and random forest classifier) and standard statistical methods to identify the significant factors that contribute to developing VTE in COVID-19 patients using a large dataset from “Coagulopathy associated with COVID-19,” a multicenter observational study. We developed 7 AI models (Multilayer perceptron classifier, Artificial neural network with backpropagation, eXtreme gradient boosting, Support vector classifier, Stochastic gradient descent classifier, Random forest classifier and Logistic regression classifier) using the selected significant features to predict the development of VTE during hospitalization and used K-fold cross-validation and hyperparameter tuning to validate and optimize the models. The models’ predictive power was tested on 2649 (33% of 8027 overall patients), which were previously separated and not used during model training and validation stages.</div></div><div><h3>Results</h3><div>Age, female sex, white ethnicity, comorbidities (diabetes, liver disease, autoimmune disease), and laboratory features (increased hemoglobin, white cell count, D-dimer, lactate dehydrogenase, ferritin), and presence of multiorgan failure were major factors associated with the development of thrombosis. Support vector classifier (SVC) model outperformed all other models, achieving an accuracy of 97%. The SVC model also led in precision (0.98), recall (0.97), and F1 score (0.97), and recorded the lowest log-loss score (0.112 on the test dataset), reflecting better model convergence and an improved fit to the data. Additionally, it achieved the highest area under the curve score (0.983).</div></div><div><h3>Conclusion</h3><div>The SVC model delivered the best overall performance outperforming similar studies that developed deep learning and machine-learning models for COVID-19.</div></div>","PeriodicalId":20893,"journal":{"name":"Research and Practice in Thrombosis and Haemostasis","volume":"9 2","pages":"Article 102711"},"PeriodicalIF":3.4000,"publicationDate":"2025-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Research and Practice in Thrombosis and Haemostasis","FirstCategoryId":"3","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S2475037925000354","RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"HEMATOLOGY","Score":null,"Total":0}
引用次数: 0
Abstract
Background
An artificial intelligence (AI) approach can be used to predict venous thromboembolism (VTE).
Objectives
To compare different AI models in predicting VTE using data from patients with COVID-19.
Methods
We used feature ranking through recursive feature elimination with AI algorithms (logistic regression and random forest classifier) and standard statistical methods to identify the significant factors that contribute to developing VTE in COVID-19 patients using a large dataset from “Coagulopathy associated with COVID-19,” a multicenter observational study. We developed 7 AI models (Multilayer perceptron classifier, Artificial neural network with backpropagation, eXtreme gradient boosting, Support vector classifier, Stochastic gradient descent classifier, Random forest classifier and Logistic regression classifier) using the selected significant features to predict the development of VTE during hospitalization and used K-fold cross-validation and hyperparameter tuning to validate and optimize the models. The models’ predictive power was tested on 2649 (33% of 8027 overall patients), which were previously separated and not used during model training and validation stages.
Results
Age, female sex, white ethnicity, comorbidities (diabetes, liver disease, autoimmune disease), and laboratory features (increased hemoglobin, white cell count, D-dimer, lactate dehydrogenase, ferritin), and presence of multiorgan failure were major factors associated with the development of thrombosis. Support vector classifier (SVC) model outperformed all other models, achieving an accuracy of 97%. The SVC model also led in precision (0.98), recall (0.97), and F1 score (0.97), and recorded the lowest log-loss score (0.112 on the test dataset), reflecting better model convergence and an improved fit to the data. Additionally, it achieved the highest area under the curve score (0.983).
Conclusion
The SVC model delivered the best overall performance outperforming similar studies that developed deep learning and machine-learning models for COVID-19.