An Extraction Tool for Venous Thromboembolism Symptom Identification in Primary Care Notes to Facilitate Electronic Clinical Quality Measure Reporting: Algorithm Development and Validation Study.

IF 3.8 3区医学 Q2 MEDICAL INFORMATICS

JMIR Medical Informatics Pub Date : 2025-08-26 DOI:10.2196/63720

John Novoa-Laurentiev, Mica Bowen, Avery Pullman, Wenyu Song, Ania Syrowatka, Jin Chen, Michael Sainlaire, Frank Chang, Krissy Gray, Purushottam Panta, Luwei Liu, Khalid Nawab, Shadi Hijjawi, Richard Schreiber, Li Zhou, Patricia C Dykes

{"title":"An Extraction Tool for Venous Thromboembolism Symptom Identification in Primary Care Notes to Facilitate Electronic Clinical Quality Measure Reporting: Algorithm Development and Validation Study.","authors":"John Novoa-Laurentiev, Mica Bowen, Avery Pullman, Wenyu Song, Ania Syrowatka, Jin Chen, Michael Sainlaire, Frank Chang, Krissy Gray, Purushottam Panta, Luwei Liu, Khalid Nawab, Shadi Hijjawi, Richard Schreiber, Li Zhou, Patricia C Dykes","doi":"10.2196/63720","DOIUrl":null,"url":null,"abstract":"Background: Diagnosis of venous thromboembolism (VTE) is often delayed, and facilitating earlier diagnosis may improve associated morbidity and mortality. Clinical notes contain information not found elsewhere in the medical record that could facilitate timely VTE diagnosis and accurate quality measurement. However, extracting relevant information from unstructured clinical notes is complex. Today, there are relatively few electronic clinical quality measures (eCQMs) in our national payment program and none that use natural language processing (NLP) techniques for data extraction. NLP holds great promise for making quality measurement more accurate and more efficient. Given the potential of NLP-based applications to facilitate more accurate VTE detection, primary care is one clinical setting in urgent need of this type of tool.Objective: This study aimed to develop a tool that extracts VTE symptoms from clinical notes for use within an eCQM to quantify the rate of delayed diagnosis of VTE in primary care settings.Methods: We iteratively developed an NLP-based data extraction tool, venous thromboembolism symptom extractor (VTExt), on an internal dataset using a rule-based approach to extract VTE symptoms from primary care clinical note text. The VTE symptoms lexicon was derived and optimized with physician guidance and externally validated using datasets from 2 independent health care organizations. We performed 26 rounds of performance evaluation of notes sampled from the case cohort (17,585 patient progress note sentences from 279 patient notes), and 5 rounds of evaluation of the control cohort (2838 patient progress note sentences from 50 patient notes). VTExt's performance was evaluated using evaluation metrics, including area under the curve, positive predictive value, negative predictive value, sensitivity, and specificity.Results: VTExt achieved near-perfect performance in extracting VTE symptoms from primary care notes sampled from records of patients diagnosed with or without VTE. In external validation, VTExt achieved promising performance in 2 additional geographically distant organizations using different electronic health record systems. When compared against a deep learning model and 4 machine learning models, VTExt exhibited similar or even improved performance across all metrics.Conclusions: This study demonstrates a data-driven NLP-based approach to clinical note information extraction that can be generalized to different electronic health record systems across different institutions. Due to the robust performance of this tool, VTExt is the first NLP application to be used in a nationally endorsed eCQM.","PeriodicalId":56334,"journal":{"name":"JMIR Medical Informatics","volume":"13 ","pages":"e63720"},"PeriodicalIF":3.8000,"publicationDate":"2025-08-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12387394/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"JMIR Medical Informatics","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.2196/63720","RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"MEDICAL INFORMATICS","Score":null,"Total":0}

引用次数: 0

Abstract

Background: Diagnosis of venous thromboembolism (VTE) is often delayed, and facilitating earlier diagnosis may improve associated morbidity and mortality. Clinical notes contain information not found elsewhere in the medical record that could facilitate timely VTE diagnosis and accurate quality measurement. However, extracting relevant information from unstructured clinical notes is complex. Today, there are relatively few electronic clinical quality measures (eCQMs) in our national payment program and none that use natural language processing (NLP) techniques for data extraction. NLP holds great promise for making quality measurement more accurate and more efficient. Given the potential of NLP-based applications to facilitate more accurate VTE detection, primary care is one clinical setting in urgent need of this type of tool.

Objective: This study aimed to develop a tool that extracts VTE symptoms from clinical notes for use within an eCQM to quantify the rate of delayed diagnosis of VTE in primary care settings.

Methods: We iteratively developed an NLP-based data extraction tool, venous thromboembolism symptom extractor (VTExt), on an internal dataset using a rule-based approach to extract VTE symptoms from primary care clinical note text. The VTE symptoms lexicon was derived and optimized with physician guidance and externally validated using datasets from 2 independent health care organizations. We performed 26 rounds of performance evaluation of notes sampled from the case cohort (17,585 patient progress note sentences from 279 patient notes), and 5 rounds of evaluation of the control cohort (2838 patient progress note sentences from 50 patient notes). VTExt's performance was evaluated using evaluation metrics, including area under the curve, positive predictive value, negative predictive value, sensitivity, and specificity.

Results: VTExt achieved near-perfect performance in extracting VTE symptoms from primary care notes sampled from records of patients diagnosed with or without VTE. In external validation, VTExt achieved promising performance in 2 additional geographically distant organizations using different electronic health record systems. When compared against a deep learning model and 4 machine learning models, VTExt exhibited similar or even improved performance across all metrics.

Conclusions: This study demonstrates a data-driven NLP-based approach to clinical note information extraction that can be generalized to different electronic health record systems across different institutions. Due to the robust performance of this tool, VTExt is the first NLP application to be used in a nationally endorsed eCQM.

Abstract Image

查看原文本刊更多论文

一种用于初级保健记录中静脉血栓栓塞症状识别的提取工具，以促进电子临床质量测量报告：算法开发和验证研究。

背景：静脉血栓栓塞（VTE）的诊断经常被延迟，促进早期诊断可能会改善相关的发病率和死亡率。临床记录包含在其他医疗记录中找不到的信息，可以促进及时的静脉血栓栓塞诊断和准确的质量测量。然而，从非结构化的临床记录中提取相关信息是复杂的。今天，在我们的国家支付计划中，电子临床质量测量（eCQMs）相对较少，而且没有一个使用自然语言处理（NLP）技术进行数据提取。NLP在提高质量测量的准确性和效率方面有着巨大的希望。鉴于基于nlp的应用在促进更准确的静脉血栓栓塞检测方面的潜力，初级保健是迫切需要这种类型工具的临床环境。目的：本研究旨在开发一种工具，从临床记录中提取静脉血栓栓塞症状，用于eCQM，以量化初级保健机构静脉血栓栓塞的延迟诊断率。方法：我们迭代开发了一个基于nlp的数据提取工具，静脉血栓栓塞症状提取器（VTExt），该工具基于内部数据集，使用基于规则的方法从初级保健临床记录文本中提取静脉血栓栓塞症状。静脉血栓栓塞症状词典是在医生指导下导出和优化的，并使用来自2个独立医疗机构的数据集进行外部验证。我们对从病例队列中抽取的笔记进行了26轮评估（从279份患者笔记中抽取17,585份患者病程记录句子），并对对照队列进行了5轮评估（从50份患者笔记中抽取2838份患者病程记录句子）。使用评价指标对VTExt的性能进行评价，包括曲线下面积、阳性预测值、阴性预测值、敏感性和特异性。结果：VTExt在从诊断为静脉血栓栓塞或不诊断为静脉血栓栓塞的患者的记录中取样的初级保健笔记中提取静脉血栓栓塞症状方面取得了近乎完美的表现。在外部验证中，VTExt在另外两个地理位置遥远的组织中使用不同的电子健康记录系统取得了良好的性能。当与深度学习模型和4种机器学习模型进行比较时，VTExt在所有指标上都表现出相似甚至更高的性能。结论：本研究展示了一种基于数据驱动的nlp临床记录信息提取方法，该方法可以推广到不同机构的不同电子健康记录系统。由于该工具的强大性能，VTExt是第一个在国家认可的eCQM中使用的NLP应用程序。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

JMIR Medical Informatics Medicine-Health Informatics

CiteScore

7.90

自引率

3.10%

发文量

173

审稿时长

12 weeks

期刊介绍： JMIR Medical Informatics (JMI, ISSN 2291-9694) is a top-rated, tier A journal which focuses on clinical informatics, big data in health and health care, decision support for health professionals, electronic health records, ehealth infrastructures and implementation. It has a focus on applied, translational research, with a broad readership including clinicians, CIOs, engineers, industry and health informatics professionals. Published by JMIR Publications, publisher of the Journal of Medical Internet Research (JMIR), the leading eHealth/mHealth journal (Impact Factor 2016: 5.175), JMIR Med Inform has a slightly different scope (emphasizing more on applications for clinicians and health professionals rather than consumers/citizens, which is the focus of JMIR), publishes even faster, and also allows papers which are more technical or more formative than what would be published in the Journal of Medical Internet Research.