静脉血栓栓塞的VTE-BERT自然语言处理模型的开发与验证。

IF 5 2区 医学 Q1 HEMATOLOGY
Omid Jafari, Shengling Ma, Barbara D Lam, Jun Y Jiang, Emily Zhou, Mrinal Ranjan, Justine Ryu, Raka Bandyo, Arash Maghsoudi, Bo Peng, Christopher I Amos, Abiodun Oluyomi, Nathanael R Fillmore, Jennifer La, Ang Li
{"title":"静脉血栓栓塞的VTE-BERT自然语言处理模型的开发与验证。","authors":"Omid Jafari, Shengling Ma, Barbara D Lam, Jun Y Jiang, Emily Zhou, Mrinal Ranjan, Justine Ryu, Raka Bandyo, Arash Maghsoudi, Bo Peng, Christopher I Amos, Abiodun Oluyomi, Nathanael R Fillmore, Jennifer La, Ang Li","doi":"10.1016/j.jtha.2025.07.021","DOIUrl":null,"url":null,"abstract":"<p><strong>Background: </strong>Accurate and rapid phenotyping of venous thromboembolism (VTE) in longitudinal studies is important. A natural language processing (NLP) tool externally validated in representative patients is lacking.</p><p><strong>Objectives: </strong>To train and validate an efficient NLP model to detect incident VTE event.</p><p><strong>Methods: </strong>We designed a novel NLP platform, NLPMed, to assist thrombosis researchers with data preprocessing, phenotype annotation, language model finetuning, and NLP application. Using clinical notes, discharge summaries, and radiology reports from patients with cancer at 2 healthcare institutions, we finetuned Bio_Clinical Bidirectional Encoder Representations from Transformers (BERT) to develop VTE-BERT. The new model was trained to detect acute VTE events and their anatomical locations longitudinally. We internally and externally validated the model's performance in 2 randomly sampled cohorts of patients with advanced cancer.</p><p><strong>Results: </strong>The training cohort consisted of 715 patients and 14 013 annotated notes with ≥1 VTE keyword from the Harris Health System. The internal validation cohort included 400 additional patients with 7190 VTE keyword-containing notes from Harris Health System. The external validation cohort included 400 patients with 7371 VTE keyword-containing notes from the national Veterans Affairs healthcare system. VTE-BERT was trained until it reached a precision of 95% and recall of 98% on the patient level. Using independent datasets, the model achieved precision and recall of 95% and 91% in internal validation and of 85% and 92% in external validation.</p><p><strong>Conclusion: </strong>We trained and externally validated an efficient NLP model to detect incident VTE events longitudinally. We believe its adoption will accelerate thrombosis research by improving VTE detection at scale and decreasing the time and expense involved with manual chart review in big data epidemiological studies.</p>","PeriodicalId":17326,"journal":{"name":"Journal of Thrombosis and Haemostasis","volume":" ","pages":""},"PeriodicalIF":5.0000,"publicationDate":"2025-08-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12360494/pdf/","citationCount":"0","resultStr":"{\"title\":\"Development and validation of venous thromboembolism-bidirectional encoder representations from transformers (VTE-BERT) natural language processing model.\",\"authors\":\"Omid Jafari, Shengling Ma, Barbara D Lam, Jun Y Jiang, Emily Zhou, Mrinal Ranjan, Justine Ryu, Raka Bandyo, Arash Maghsoudi, Bo Peng, Christopher I Amos, Abiodun Oluyomi, Nathanael R Fillmore, Jennifer La, Ang Li\",\"doi\":\"10.1016/j.jtha.2025.07.021\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p><strong>Background: </strong>Accurate and rapid phenotyping of venous thromboembolism (VTE) in longitudinal studies is important. A natural language processing (NLP) tool externally validated in representative patients is lacking.</p><p><strong>Objectives: </strong>To train and validate an efficient NLP model to detect incident VTE event.</p><p><strong>Methods: </strong>We designed a novel NLP platform, NLPMed, to assist thrombosis researchers with data preprocessing, phenotype annotation, language model finetuning, and NLP application. Using clinical notes, discharge summaries, and radiology reports from patients with cancer at 2 healthcare institutions, we finetuned Bio_Clinical Bidirectional Encoder Representations from Transformers (BERT) to develop VTE-BERT. The new model was trained to detect acute VTE events and their anatomical locations longitudinally. We internally and externally validated the model's performance in 2 randomly sampled cohorts of patients with advanced cancer.</p><p><strong>Results: </strong>The training cohort consisted of 715 patients and 14 013 annotated notes with ≥1 VTE keyword from the Harris Health System. The internal validation cohort included 400 additional patients with 7190 VTE keyword-containing notes from Harris Health System. The external validation cohort included 400 patients with 7371 VTE keyword-containing notes from the national Veterans Affairs healthcare system. VTE-BERT was trained until it reached a precision of 95% and recall of 98% on the patient level. Using independent datasets, the model achieved precision and recall of 95% and 91% in internal validation and of 85% and 92% in external validation.</p><p><strong>Conclusion: </strong>We trained and externally validated an efficient NLP model to detect incident VTE events longitudinally. We believe its adoption will accelerate thrombosis research by improving VTE detection at scale and decreasing the time and expense involved with manual chart review in big data epidemiological studies.</p>\",\"PeriodicalId\":17326,\"journal\":{\"name\":\"Journal of Thrombosis and Haemostasis\",\"volume\":\" \",\"pages\":\"\"},\"PeriodicalIF\":5.0000,\"publicationDate\":\"2025-08-05\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12360494/pdf/\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Journal of Thrombosis and Haemostasis\",\"FirstCategoryId\":\"3\",\"ListUrlMain\":\"https://doi.org/10.1016/j.jtha.2025.07.021\",\"RegionNum\":2,\"RegionCategory\":\"医学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"HEMATOLOGY\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Thrombosis and Haemostasis","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1016/j.jtha.2025.07.021","RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"HEMATOLOGY","Score":null,"Total":0}
引用次数: 0

摘要

背景:纵向研究中准确和快速的静脉血栓栓塞(VTE)表型是很重要的。自然语言处理(NLP)工具在代表性患者中缺乏外部验证。方法:我们设计了一个新的NLP平台NLPMed,以帮助血栓研究人员进行数据预处理、表型注释、语言模型微调和NLP应用。利用来自两家医疗机构的癌症患者的临床记录、出院总结和放射学报告,我们对Bio_ClinicalBERT进行了微调,以开发VTE-BERT。新模型被训练来检测急性静脉血栓栓塞事件及其纵向解剖位置。我们在两个随机抽样的晚期癌症患者队列中对模型的性能进行了内部和外部验证。结果:培训队列包括来自Harris Health System (HHS)的715名患者和14,013份VTE关键字≥1的注释笔记。内部验证队列包括400名额外的患者,其中包含来自HHS的7,190个VTE关键字注释。外部验证队列包括来自国家退伍军人事务医疗保健系统的包含7,371个VTE关键字的笔记的400例患者。对VTE-BERT进行训练,直到在患者水平上达到95%的准确率和98%的召回率。使用独立数据集时,模型内部验证的准确率和召回率分别为95%和91%,外部验证的准确率和召回率分别为85%和92%。结论:我们训练并外部验证了一个有效的NLP模型来纵向检测VTE事件。我们相信,通过提高VTE的大规模检测,减少大数据流行病学研究中手工图表审查的时间和费用,该技术的采用将加速血栓形成研究。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Development and validation of venous thromboembolism-bidirectional encoder representations from transformers (VTE-BERT) natural language processing model.

Background: Accurate and rapid phenotyping of venous thromboembolism (VTE) in longitudinal studies is important. A natural language processing (NLP) tool externally validated in representative patients is lacking.

Objectives: To train and validate an efficient NLP model to detect incident VTE event.

Methods: We designed a novel NLP platform, NLPMed, to assist thrombosis researchers with data preprocessing, phenotype annotation, language model finetuning, and NLP application. Using clinical notes, discharge summaries, and radiology reports from patients with cancer at 2 healthcare institutions, we finetuned Bio_Clinical Bidirectional Encoder Representations from Transformers (BERT) to develop VTE-BERT. The new model was trained to detect acute VTE events and their anatomical locations longitudinally. We internally and externally validated the model's performance in 2 randomly sampled cohorts of patients with advanced cancer.

Results: The training cohort consisted of 715 patients and 14 013 annotated notes with ≥1 VTE keyword from the Harris Health System. The internal validation cohort included 400 additional patients with 7190 VTE keyword-containing notes from Harris Health System. The external validation cohort included 400 patients with 7371 VTE keyword-containing notes from the national Veterans Affairs healthcare system. VTE-BERT was trained until it reached a precision of 95% and recall of 98% on the patient level. Using independent datasets, the model achieved precision and recall of 95% and 91% in internal validation and of 85% and 92% in external validation.

Conclusion: We trained and externally validated an efficient NLP model to detect incident VTE events longitudinally. We believe its adoption will accelerate thrombosis research by improving VTE detection at scale and decreasing the time and expense involved with manual chart review in big data epidemiological studies.

求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
Journal of Thrombosis and Haemostasis
Journal of Thrombosis and Haemostasis 医学-外周血管病
CiteScore
24.30
自引率
3.80%
发文量
321
审稿时长
1 months
期刊介绍: The Journal of Thrombosis and Haemostasis (JTH) serves as the official journal of the International Society on Thrombosis and Haemostasis. It is dedicated to advancing science related to thrombosis, bleeding disorders, and vascular biology through the dissemination and exchange of information and ideas within the global research community. Types of Publications: The journal publishes a variety of content, including: Original research reports State-of-the-art reviews Brief reports Case reports Invited commentaries on publications in the Journal Forum articles Correspondence Announcements Scope of Contributions: Editors invite contributions from both fundamental and clinical domains. These include: Basic manuscripts on blood coagulation and fibrinolysis Studies on proteins and reactions related to thrombosis and haemostasis Research on blood platelets and their interactions with other biological systems, such as the vessel wall, blood cells, and invading organisms Clinical manuscripts covering various topics including venous thrombosis, arterial disease, hemophilia, bleeding disorders, and platelet diseases Clinical manuscripts may encompass etiology, diagnostics, prognosis, prevention, and treatment strategies.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信