Prediction model for major bleeding in anticoagulated patients with cancer-associated venous thromboembolism using machine learning and natural language processing

Andrés J. Muñoz Martín, Ramón Lecumberri, Juan Carlos Souto, Berta Obispo, Antonio Sanchez, Jorge Aparicio, Cristina Aguayo, David Gutierrez, Andrés García Palomo, Diego Benavent, Miren Taberna, María Carmen Viñuela-Benéitez, Daniel Arumi, Miguel Ángel Hernández-Presa
{"title":"Prediction model for major bleeding in anticoagulated patients with cancer-associated venous thromboembolism using machine learning and natural language processing","authors":"Andrés J. Muñoz Martín, Ramón Lecumberri, Juan Carlos Souto, Berta Obispo, Antonio Sanchez, Jorge Aparicio, Cristina Aguayo, David Gutierrez, Andrés García Palomo, Diego Benavent, Miren Taberna, María Carmen Viñuela-Benéitez, Daniel Arumi, Miguel Ángel Hernández-Presa","doi":"10.1007/s12094-024-03586-2","DOIUrl":null,"url":null,"abstract":"<h3 data-test=\"abstract-sub-heading\">Purpose</h3><p>We developed a predictive model to assess the risk of major bleeding (MB) within 6 months of primary venous thromboembolism (VTE) in cancer patients receiving anticoagulant treatment. We also sought to describe the prevalence and incidence of VTE in cancer patients, and to describe clinical characteristics at baseline and bleeding events during follow-up in patients receiving anticoagulants.</p><h3 data-test=\"abstract-sub-heading\">Methods</h3><p>This observational, retrospective, and multicenter study used natural language processing and machine learning (ML), to analyze unstructured clinical data from electronic health records from nine Spanish hospitals between 2014 and 2018. All adult cancer patients with VTE receiving anticoagulants were included. Both clinically- and ML-driven feature selection was performed to identify MB predictors. Logistic regression (LR), decision tree (DT), and random forest (RF) algorithms were used to train predictive models, which were validated in a hold-out dataset and compared to the previously developed CAT-BLEED score.</p><h3 data-test=\"abstract-sub-heading\">Results</h3><p>Of the 2,893,108 cancer patients screened, in-hospital VTE prevalence was 5.8% and the annual incidence ranged from 2.7 to 3.9%. We identified 21,227 patients with active cancer and VTE receiving anticoagulants (53.9% men, median age of 70 years). MB events after VTE diagnosis occurred in 10.9% of patients within the first six months. MB predictors included: hemoglobin, metastasis, age, platelets, leukocytes, and serum creatinine. The LR, DT, and RF models had AUC-ROC (95% confidence interval) values of 0.60 (0.55, 0.65), 0.60 (0.55, 0.65), and 0.61 (0.56, 0.66), respectively. These models outperformed the CAT-BLEED score with values of 0.53 (0.48, 0.59).</p><h3 data-test=\"abstract-sub-heading\">Conclusions</h3><p>Our study shows encouraging results in identifying anticoagulated patients with cancer-associated VTE who are at high risk of MB.</p>","PeriodicalId":10166,"journal":{"name":"Clinical and Translational Oncology","volume":"4 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-09-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Clinical and Translational Oncology","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1007/s12094-024-03586-2","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

Purpose

We developed a predictive model to assess the risk of major bleeding (MB) within 6 months of primary venous thromboembolism (VTE) in cancer patients receiving anticoagulant treatment. We also sought to describe the prevalence and incidence of VTE in cancer patients, and to describe clinical characteristics at baseline and bleeding events during follow-up in patients receiving anticoagulants.

Methods

This observational, retrospective, and multicenter study used natural language processing and machine learning (ML), to analyze unstructured clinical data from electronic health records from nine Spanish hospitals between 2014 and 2018. All adult cancer patients with VTE receiving anticoagulants were included. Both clinically- and ML-driven feature selection was performed to identify MB predictors. Logistic regression (LR), decision tree (DT), and random forest (RF) algorithms were used to train predictive models, which were validated in a hold-out dataset and compared to the previously developed CAT-BLEED score.

Results

Of the 2,893,108 cancer patients screened, in-hospital VTE prevalence was 5.8% and the annual incidence ranged from 2.7 to 3.9%. We identified 21,227 patients with active cancer and VTE receiving anticoagulants (53.9% men, median age of 70 years). MB events after VTE diagnosis occurred in 10.9% of patients within the first six months. MB predictors included: hemoglobin, metastasis, age, platelets, leukocytes, and serum creatinine. The LR, DT, and RF models had AUC-ROC (95% confidence interval) values of 0.60 (0.55, 0.65), 0.60 (0.55, 0.65), and 0.61 (0.56, 0.66), respectively. These models outperformed the CAT-BLEED score with values of 0.53 (0.48, 0.59).

Conclusions

Our study shows encouraging results in identifying anticoagulated patients with cancer-associated VTE who are at high risk of MB.

Abstract Image

利用机器学习和自然语言处理建立癌症相关静脉血栓栓塞抗凝患者大出血预测模型
目的我们开发了一种预测模型,用于评估接受抗凝治疗的癌症患者在原发性静脉血栓栓塞(VTE)后 6 个月内发生大出血(MB)的风险。我们还试图描述癌症患者中 VTE 的流行率和发病率,并描述接受抗凝药物治疗的患者基线时的临床特征和随访期间的出血事件。方法这项观察性、回顾性和多中心研究使用自然语言处理和机器学习(ML)分析了 2014 年至 2018 年期间来自九家西班牙医院电子健康记录的非结构化临床数据。研究纳入了所有接受抗凝治疗的VTE成人癌症患者。通过临床和 ML 驱动的特征选择来识别 MB 预测因子。逻辑回归(LR)、决策树(DT)和随机森林(RF)算法被用于训练预测模型,这些模型在保留数据集中进行了验证,并与之前开发的 CAT-BLEED 评分进行了比较。结果在接受筛查的 2893108 名癌症患者中,院内 VTE 患病率为 5.8%,年发病率为 2.7% 至 3.9%。我们发现了 21,227 名患有活动性癌症和 VTE 并正在接受抗凝治疗的患者(53.9% 为男性,中位年龄为 70 岁)。10.9% 的患者在确诊 VTE 后的头六个月内发生了 MB 事件。MB 预测因素包括:血红蛋白、转移、年龄、血小板、白细胞和血清肌酐。LR、DT和RF模型的AUC-ROC(95%置信区间)值分别为0.60(0.55,0.65)、0.60(0.55,0.65)和0.61(0.56,0.66)。结论我们的研究在识别抗凝的癌症相关 VTE 患者中发现了令人鼓舞的结果,这些患者具有 MB 的高风险。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信