Comparison of six natural language processing approaches to assessing firearm access in Veterans Health Administration electronic health records.

IF 4.7 2区医学 Q1 COMPUTER SCIENCE, INFORMATION SYSTEMS

Journal of the American Medical Informatics Association Pub Date : 2025-01-01 DOI:10.1093/jamia/ocae169

Joshua Trujeque, R Adams Dudley, Nathan Mesfin, Nicholas E Ingraham, Isai Ortiz, Ann Bangerter, Anjan Chakraborty, Dalton Schutte, Jeremy Yeung, Ying Liu, Alicia Woodward-Abel, Emma Bromley, Rui Zhang, Lisa A Brenner, Joseph A Simonetti

{"title":"Comparison of six natural language processing approaches to assessing firearm access in Veterans Health Administration electronic health records.","authors":"Joshua Trujeque, R Adams Dudley, Nathan Mesfin, Nicholas E Ingraham, Isai Ortiz, Ann Bangerter, Anjan Chakraborty, Dalton Schutte, Jeremy Yeung, Ying Liu, Alicia Woodward-Abel, Emma Bromley, Rui Zhang, Lisa A Brenner, Joseph A Simonetti","doi":"10.1093/jamia/ocae169","DOIUrl":null,"url":null,"abstract":"Objective: Access to firearms is associated with increased suicide risk. Our aim was to develop a natural language processing approach to characterizing firearm access in clinical records.Materials and methods: We used clinical notes from 36 685 Veterans Health Administration (VHA) patients between April 10, 2023 and April 10, 2024. We expanded preexisting firearm term sets using subject matter experts and generated 250-character snippets around each firearm term appearing in notes. Annotators labeled 3000 snippets into three classes. Using these annotated snippets, we compared four nonneural machine learning models (random forest, bagging, gradient boosting, logistic regression with ridge penalization) and two versions of Bidirectional Encoder Representations from Transformers, or BERT (specifically, BioBERT and Bio-ClinicalBERT) for classifying firearm access as \"definite access\", \"definitely no access\", or \"other\".Results: Firearm terms were identified in 36 685 patient records (41.3%), 33.7% of snippets were categorized as definite access, 9.0% as definitely no access, and 57.2% as \"other\". Among models classifying firearm access, five of six had acceptable performance, with BioBERT and Bio-ClinicalBERT performing best, with F1s of 0.876 (95% confidence interval, 0.874-0.879) and 0.896 (95% confidence interval, 0.894-0.899), respectively.Discussion and conclusion: Firearm-related terminology is common in the clinical records of VHA patients. The ability to use text to identify and characterize patients' firearm access could enhance suicide prevention efforts, and five of our six models could be used to identify patients for clinical interventions.","PeriodicalId":50016,"journal":{"name":"Journal of the American Medical Informatics Association","volume":" ","pages":"113-118"},"PeriodicalIF":4.7000,"publicationDate":"2025-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11648724/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of the American Medical Informatics Association","FirstCategoryId":"91","ListUrlMain":"https://doi.org/10.1093/jamia/ocae169","RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}

引用次数: 0

Abstract

Objective: Access to firearms is associated with increased suicide risk. Our aim was to develop a natural language processing approach to characterizing firearm access in clinical records.

Materials and methods: We used clinical notes from 36 685 Veterans Health Administration (VHA) patients between April 10, 2023 and April 10, 2024. We expanded preexisting firearm term sets using subject matter experts and generated 250-character snippets around each firearm term appearing in notes. Annotators labeled 3000 snippets into three classes. Using these annotated snippets, we compared four nonneural machine learning models (random forest, bagging, gradient boosting, logistic regression with ridge penalization) and two versions of Bidirectional Encoder Representations from Transformers, or BERT (specifically, BioBERT and Bio-ClinicalBERT) for classifying firearm access as "definite access", "definitely no access", or "other".

Results: Firearm terms were identified in 36 685 patient records (41.3%), 33.7% of snippets were categorized as definite access, 9.0% as definitely no access, and 57.2% as "other". Among models classifying firearm access, five of six had acceptable performance, with BioBERT and Bio-ClinicalBERT performing best, with F1s of 0.876 (95% confidence interval, 0.874-0.879) and 0.896 (95% confidence interval, 0.894-0.899), respectively.

Discussion and conclusion: Firearm-related terminology is common in the clinical records of VHA patients. The ability to use text to identify and characterize patients' firearm access could enhance suicide prevention efforts, and five of our six models could be used to identify patients for clinical interventions.

查看原文本刊更多论文

比较六种自然语言处理方法，以评估退伍军人健康管理局电子健康记录中的枪支使用情况。

目的：接触枪支与自杀风险增加有关。我们的目的是开发一种自然语言处理方法来描述临床记录中的枪支使用情况：我们使用了 2023 年 4 月 10 日至 2024 年 4 月 10 日期间 36 685 名退伍军人健康管理局（VHA）患者的临床记录。我们利用主题专家扩充了已有的枪支术语集，并围绕笔记中出现的每个枪支术语生成了 250 个字符的片段。注释者将 3000 个片段分为三类。利用这些标注片段，我们比较了四种非神经机器学习模型（随机森林、bagging、梯度提升、带山脊惩罚的逻辑回归）和两个版本的双向编码器表征转换器（Bidirectional Encoder Representations from Transformers，简称 BERT）（特别是 BioBERT 和 Bio-ClinicalBERT），以将枪支接触分为 "肯定接触"、"肯定不接触 "或 "其他"：在 36 685 份病历（41.3%）中识别出了枪支术语，33.7% 的片段被归类为明确接触枪支，9.0% 被归类为明确不接触枪支，57.2% 被归类为 "其他"。在对使用枪支进行分类的模型中，六个模型中有五个的性能可以接受，其中 BioBERT 和 Bio-ClinicalBERT 的性能最好，F1 分别为 0.876（95% 置信区间，0.874-0.879）和 0.896（95% 置信区间，0.894-0.899）：在退伍军人事务部患者的临床记录中，与枪支有关的术语很常见。利用文本识别和描述患者使用枪支情况的能力可以加强自杀预防工作，我们的六个模型中有五个模型可用于识别患者以进行临床干预。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Journal of the American Medical Informatics Association 医学-计算机：跨学科应用

CiteScore

14.50

自引率

7.80%

发文量

230

审稿时长

3-8 weeks

期刊介绍： JAMIA is AMIA''s premier peer-reviewed journal for biomedical and health informatics. Covering the full spectrum of activities in the field, JAMIA includes informatics articles in the areas of clinical care, clinical research, translational science, implementation science, imaging, education, consumer health, public health, and policy. JAMIA''s articles describe innovative informatics research and systems that help to advance biomedical science and to promote health. Case reports, perspectives and reviews also help readers stay connected with the most important informatics developments in implementation, policy and education.