静脉血栓栓塞的VTE-BERT自然语言处理模型的开发与验证。

IF 5 2区医学 Q1 HEMATOLOGY

Journal of Thrombosis and Haemostasis Pub Date : 2025-08-05 DOI:10.1016/j.jtha.2025.07.021

Omid Jafari, Shengling Ma, Barbara D Lam, Jun Y Jiang, Emily Zhou, Mrinal Ranjan, Justine Ryu, Raka Bandyo, Arash Maghsoudi, Bo Peng, Christopher I Amos, Abiodun Oluyomi, Nathanael R Fillmore, Jennifer La, Ang Li

{"title":"静脉血栓栓塞的VTE-BERT自然语言处理模型的开发与验证。","authors":"Omid Jafari, Shengling Ma, Barbara D Lam, Jun Y Jiang, Emily Zhou, Mrinal Ranjan, Justine Ryu, Raka Bandyo, Arash Maghsoudi, Bo Peng, Christopher I Amos, Abiodun Oluyomi, Nathanael R Fillmore, Jennifer La, Ang Li","doi":"10.1016/j.jtha.2025.07.021","DOIUrl":null,"url":null,"abstract":"Background: Accurate and rapid phenotyping of venous thromboembolism (VTE) in longitudinal studies is important. A natural language processing (NLP) tool externally validated in representative patients is lacking.Objectives: To train and validate an efficient NLP model to detect incident VTE event.Methods: We designed a novel NLP platform, NLPMed, to assist thrombosis researchers with data preprocessing, phenotype annotation, language model finetuning, and NLP application. Using clinical notes, discharge summaries, and radiology reports from patients with cancer at 2 healthcare institutions, we finetuned Bio_Clinical Bidirectional Encoder Representations from Transformers (BERT) to develop VTE-BERT. The new model was trained to detect acute VTE events and their anatomical locations longitudinally. We internally and externally validated the model's performance in 2 randomly sampled cohorts of patients with advanced cancer.Results: The training cohort consisted of 715 patients and 14 013 annotated notes with ≥1 VTE keyword from the Harris Health System. The internal validation cohort included 400 additional patients with 7190 VTE keyword-containing notes from Harris Health System. The external validation cohort included 400 patients with 7371 VTE keyword-containing notes from the national Veterans Affairs healthcare system. VTE-BERT was trained until it reached a precision of 95% and recall of 98% on the patient level. Using independent datasets, the model achieved precision and recall of 95% and 91% in internal validation and of 85% and 92% in external validation.Conclusion: We trained and externally validated an efficient NLP model to detect incident VTE events longitudinally. We believe its adoption will accelerate thrombosis research by improving VTE detection at scale and decreasing the time and expense involved with manual chart review in big data epidemiological studies.","PeriodicalId":17326,"journal":{"name":"Journal of Thrombosis and Haemostasis","volume":" ","pages":""},"PeriodicalIF":5.0000,"publicationDate":"2025-08-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12360494/pdf/","citationCount":"0","resultStr":"{\"title\":\"Development and validation of venous thromboembolism-bidirectional encoder representations from transformers (VTE-BERT) natural language processing model.\",\"authors\":\"Omid Jafari, Shengling Ma, Barbara D Lam, Jun Y Jiang, Emily Zhou, Mrinal Ranjan, Justine Ryu, Raka Bandyo, Arash Maghsoudi, Bo Peng, Christopher I Amos, Abiodun Oluyomi, Nathanael R Fillmore, Jennifer La, Ang Li\",\"doi\":\"10.1016/j.jtha.2025.07.021\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Background: Accurate and rapid phenotyping of venous thromboembolism (VTE) in longitudinal studies is important. A natural language processing (NLP) tool externally validated in representative patients is lacking.Objectives: To train and validate an efficient NLP model to detect incident VTE event.Methods: We designed a novel NLP platform, NLPMed, to assist thrombosis researchers with data preprocessing, phenotype annotation, language model finetuning, and NLP application. Using clinical notes, discharge summaries, and radiology reports from patients with cancer at 2 healthcare institutions, we finetuned Bio_Clinical Bidirectional Encoder Representations from Transformers (BERT) to develop VTE-BERT. The new model was trained to detect acute VTE events and their anatomical locations longitudinally. We internally and externally validated the model's performance in 2 randomly sampled cohorts of patients with advanced cancer.Results: The training cohort consisted of 715 patients and 14 013 annotated notes with ≥1 VTE keyword from the Harris Health System. The internal validation cohort included 400 additional patients with 7190 VTE keyword-containing notes from Harris Health System. The external validation cohort included 400 patients with 7371 VTE keyword-containing notes from the national Veterans Affairs healthcare system. VTE-BERT was trained until it reached a precision of 95% and recall of 98% on the patient level. Using independent datasets, the model achieved precision and recall of 95% and 91% in internal validation and of 85% and 92% in external validation.Conclusion: We trained and externally validated an efficient NLP model to detect incident VTE events longitudinally. We believe its adoption will accelerate thrombosis research by improving VTE detection at scale and decreasing the time and expense involved with manual chart review in big data epidemiological studies.\",\"PeriodicalId\":17326,\"journal\":{\"name\":\"Journal of Thrombosis and Haemostasis\",\"volume\":\" \",\"pages\":\"\"},\"PeriodicalIF\":5.0000,\"publicationDate\":\"2025-08-05\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12360494/pdf/\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Journal of Thrombosis and Haemostasis\",\"FirstCategoryId\":\"3\",\"ListUrlMain\":\"https://doi.org/10.1016/j.jtha.2025.07.021\",\"RegionNum\":2,\"RegionCategory\":\"医学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"HEMATOLOGY\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Thrombosis and Haemostasis","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1016/j.jtha.2025.07.021","RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"HEMATOLOGY","Score":null,"Total":0}

引用次数: 0

摘要

背景：纵向研究中准确和快速的静脉血栓栓塞（VTE）表型是很重要的。自然语言处理（NLP）工具在代表性患者中缺乏外部验证。方法：我们设计了一个新的NLP平台NLPMed，以帮助血栓研究人员进行数据预处理、表型注释、语言模型微调和NLP应用。利用来自两家医疗机构的癌症患者的临床记录、出院总结和放射学报告，我们对Bio_ClinicalBERT进行了微调，以开发VTE-BERT。新模型被训练来检测急性静脉血栓栓塞事件及其纵向解剖位置。我们在两个随机抽样的晚期癌症患者队列中对模型的性能进行了内部和外部验证。结果：培训队列包括来自Harris Health System （HHS）的715名患者和14,013份VTE关键字≥1的注释笔记。内部验证队列包括400名额外的患者，其中包含来自HHS的7,190个VTE关键字注释。外部验证队列包括来自国家退伍军人事务医疗保健系统的包含7,371个VTE关键字的笔记的400例患者。对VTE-BERT进行训练，直到在患者水平上达到95%的准确率和98%的召回率。使用独立数据集时，模型内部验证的准确率和召回率分别为95%和91%，外部验证的准确率和召回率分别为85%和92%。结论：我们训练并外部验证了一个有效的NLP模型来纵向检测VTE事件。我们相信，通过提高VTE的大规模检测，减少大数据流行病学研究中手工图表审查的时间和费用，该技术的采用将加速血栓形成研究。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Development and validation of venous thromboembolism-bidirectional encoder representations from transformers (VTE-BERT) natural language processing model.

Background: Accurate and rapid phenotyping of venous thromboembolism (VTE) in longitudinal studies is important. A natural language processing (NLP) tool externally validated in representative patients is lacking.

Objectives: To train and validate an efficient NLP model to detect incident VTE event.

Methods: We designed a novel NLP platform, NLPMed, to assist thrombosis researchers with data preprocessing, phenotype annotation, language model finetuning, and NLP application. Using clinical notes, discharge summaries, and radiology reports from patients with cancer at 2 healthcare institutions, we finetuned Bio_Clinical Bidirectional Encoder Representations from Transformers (BERT) to develop VTE-BERT. The new model was trained to detect acute VTE events and their anatomical locations longitudinally. We internally and externally validated the model's performance in 2 randomly sampled cohorts of patients with advanced cancer.

Results: The training cohort consisted of 715 patients and 14 013 annotated notes with ≥1 VTE keyword from the Harris Health System. The internal validation cohort included 400 additional patients with 7190 VTE keyword-containing notes from Harris Health System. The external validation cohort included 400 patients with 7371 VTE keyword-containing notes from the national Veterans Affairs healthcare system. VTE-BERT was trained until it reached a precision of 95% and recall of 98% on the patient level. Using independent datasets, the model achieved precision and recall of 95% and 91% in internal validation and of 85% and 92% in external validation.

Conclusion: We trained and externally validated an efficient NLP model to detect incident VTE events longitudinally. We believe its adoption will accelerate thrombosis research by improving VTE detection at scale and decreasing the time and expense involved with manual chart review in big data epidemiological studies.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Journal of Thrombosis and Haemostasis 医学-外周血管病

CiteScore

24.30

自引率

3.80%

发文量

321

审稿时长

1 months

期刊介绍： The Journal of Thrombosis and Haemostasis (JTH) serves as the official journal of the International Society on Thrombosis and Haemostasis. It is dedicated to advancing science related to thrombosis, bleeding disorders, and vascular biology through the dissemination and exchange of information and ideas within the global research community. Types of Publications: The journal publishes a variety of content, including: Original research reports State-of-the-art reviews Brief reports Case reports Invited commentaries on publications in the Journal Forum articles Correspondence Announcements Scope of Contributions: Editors invite contributions from both fundamental and clinical domains. These include: Basic manuscripts on blood coagulation and fibrinolysis Studies on proteins and reactions related to thrombosis and haemostasis Research on blood platelets and their interactions with other biological systems, such as the vessel wall, blood cells, and invading organisms Clinical manuscripts covering various topics including venous thrombosis, arterial disease, hemophilia, bleeding disorders, and platelet diseases Clinical manuscripts may encompass etiology, diagnostics, prognosis, prevention, and treatment strategies.