Joshua Trujeque, R Adams Dudley, Nathan Mesfin, Nicholas E Ingraham, Isai Ortiz, Ann Bangerter, Anjan Chakraborty, Dalton Schutte, Jeremy Yeung, Ying Liu, Alicia Woodward-Abel, Emma Bromley, Rui Zhang, Lisa A Brenner, Joseph A Simonetti
{"title":"Comparison of six natural language processing approaches to assessing firearm access in Veterans Health Administration electronic health records.","authors":"Joshua Trujeque, R Adams Dudley, Nathan Mesfin, Nicholas E Ingraham, Isai Ortiz, Ann Bangerter, Anjan Chakraborty, Dalton Schutte, Jeremy Yeung, Ying Liu, Alicia Woodward-Abel, Emma Bromley, Rui Zhang, Lisa A Brenner, Joseph A Simonetti","doi":"10.1093/jamia/ocae169","DOIUrl":null,"url":null,"abstract":"<p><strong>Objective: </strong>Access to firearms is associated with increased suicide risk. Our aim was to develop a natural language processing approach to characterizing firearm access in clinical records.</p><p><strong>Materials and methods: </strong>We used clinical notes from 36 685 Veterans Health Administration (VHA) patients between April 10, 2023 and April 10, 2024. We expanded preexisting firearm term sets using subject matter experts and generated 250-character snippets around each firearm term appearing in notes. Annotators labeled 3000 snippets into three classes. Using these annotated snippets, we compared four nonneural machine learning models (random forest, bagging, gradient boosting, logistic regression with ridge penalization) and two versions of Bidirectional Encoder Representations from Transformers, or BERT (specifically, BioBERT and Bio-ClinicalBERT) for classifying firearm access as \"definite access\", \"definitely no access\", or \"other\".</p><p><strong>Results: </strong>Firearm terms were identified in 36 685 patient records (41.3%), 33.7% of snippets were categorized as definite access, 9.0% as definitely no access, and 57.2% as \"other\". Among models classifying firearm access, five of six had acceptable performance, with BioBERT and Bio-ClinicalBERT performing best, with F1s of 0.876 (95% confidence interval, 0.874-0.879) and 0.896 (95% confidence interval, 0.894-0.899), respectively.</p><p><strong>Discussion and conclusion: </strong>Firearm-related terminology is common in the clinical records of VHA patients. The ability to use text to identify and characterize patients' firearm access could enhance suicide prevention efforts, and five of our six models could be used to identify patients for clinical interventions.</p>","PeriodicalId":50016,"journal":{"name":"Journal of the American Medical Informatics Association","volume":" ","pages":""},"PeriodicalIF":4.7000,"publicationDate":"2024-11-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of the American Medical Informatics Association","FirstCategoryId":"91","ListUrlMain":"https://doi.org/10.1093/jamia/ocae169","RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}
引用次数: 0
Abstract
Objective: Access to firearms is associated with increased suicide risk. Our aim was to develop a natural language processing approach to characterizing firearm access in clinical records.
Materials and methods: We used clinical notes from 36 685 Veterans Health Administration (VHA) patients between April 10, 2023 and April 10, 2024. We expanded preexisting firearm term sets using subject matter experts and generated 250-character snippets around each firearm term appearing in notes. Annotators labeled 3000 snippets into three classes. Using these annotated snippets, we compared four nonneural machine learning models (random forest, bagging, gradient boosting, logistic regression with ridge penalization) and two versions of Bidirectional Encoder Representations from Transformers, or BERT (specifically, BioBERT and Bio-ClinicalBERT) for classifying firearm access as "definite access", "definitely no access", or "other".
Results: Firearm terms were identified in 36 685 patient records (41.3%), 33.7% of snippets were categorized as definite access, 9.0% as definitely no access, and 57.2% as "other". Among models classifying firearm access, five of six had acceptable performance, with BioBERT and Bio-ClinicalBERT performing best, with F1s of 0.876 (95% confidence interval, 0.874-0.879) and 0.896 (95% confidence interval, 0.894-0.899), respectively.
Discussion and conclusion: Firearm-related terminology is common in the clinical records of VHA patients. The ability to use text to identify and characterize patients' firearm access could enhance suicide prevention efforts, and five of our six models could be used to identify patients for clinical interventions.
期刊介绍:
JAMIA is AMIA''s premier peer-reviewed journal for biomedical and health informatics. Covering the full spectrum of activities in the field, JAMIA includes informatics articles in the areas of clinical care, clinical research, translational science, implementation science, imaging, education, consumer health, public health, and policy. JAMIA''s articles describe innovative informatics research and systems that help to advance biomedical science and to promote health. Case reports, perspectives and reviews also help readers stay connected with the most important informatics developments in implementation, policy and education.