Natural Language Processing for Identification of Hospitalized People Who Use Drugs: Cohort Study.

JMIR AI Pub Date : 2025-07-18 DOI:10.2196/63147
Taisuke Sato, Emily D Grussing, Ruchi Patel, Jessica Ridgway, Joji Suzuki, Benjamin Sweigart, Robert Miller, Alysse G Wurcel
{"title":"Natural Language Processing for Identification of Hospitalized People Who Use Drugs: Cohort Study.","authors":"Taisuke Sato, Emily D Grussing, Ruchi Patel, Jessica Ridgway, Joji Suzuki, Benjamin Sweigart, Robert Miller, Alysse G Wurcel","doi":"10.2196/63147","DOIUrl":null,"url":null,"abstract":"<p><strong>Background: </strong>People who use drugs (PWUD) are at heightened risk of severe injection-related infections. Current research relies on billing codes to identify PWUD-a methodology with suboptimal accuracy that may underestimate the economic, racial, and ethnic diversity of hospitalized PWUD.</p><p><strong>Objective: </strong>The goal of this study is to examine the impact of natural language processing (NLP) on enhancing identification of PWUD in electronic medical records, with a specific focus on determining improved systems of identifying populations who may previously been missed, including people who have low income or those from racially and ethnically minoritized populations.</p><p><strong>Methods: </strong>Health informatics specialists assisted in querying a cohort of likely PWUD hospital admissions at Tufts Medical Center between 2020-2022 using the following criteria: (1) ICD-10 codes indicative of drug use, (2) positive drug toxicology results, (3) prescriptions for medications for opioid use disorder, and (4) applying NLP-detected presence of \"token\" keywords in the electronic medical records likely indicative of the patient being a PWUD. Hospital admissions were split into two groups: highly documented (all four criteria present) and minimally documented (NLP-only). These groups were examined to assess the impact of race, ethnicity, and social vulnerability index. With chart review as the \"gold standard,\" the positive predictive value was calculated.</p><p><strong>Results: </strong>The cohort included 4548 hospitalization admissions, with broad heterogeneity in how people entered the cohort and subcohorts; a total of 288 hospital admissions entered the cohort through NLP token presence alone. NLP demonstrated a 54% positive predictive value, outperforming biomarkers, prescription for medications for opioid use disorder, and ICD codes in identifying hospitalizations of PWUD. Additionally, NLP significantly enhanced these methods when integrated into the identification algorithm. The study also found that people from racially and ethnically minoritized communities and those with lower social vulnerability index were significantly more likely to have lower rates of PWUD-related documentation.</p><p><strong>Conclusions: </strong>NLP proved effective in identifying hospitalizations of PWUD, surpassing traditional methods. While further refinement is needed, NLP shows promising potential in minimizing health care disparities.</p>","PeriodicalId":73551,"journal":{"name":"JMIR AI","volume":"4 ","pages":"e63147"},"PeriodicalIF":0.0000,"publicationDate":"2025-07-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"JMIR AI","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.2196/63147","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

Background: People who use drugs (PWUD) are at heightened risk of severe injection-related infections. Current research relies on billing codes to identify PWUD-a methodology with suboptimal accuracy that may underestimate the economic, racial, and ethnic diversity of hospitalized PWUD.

Objective: The goal of this study is to examine the impact of natural language processing (NLP) on enhancing identification of PWUD in electronic medical records, with a specific focus on determining improved systems of identifying populations who may previously been missed, including people who have low income or those from racially and ethnically minoritized populations.

Methods: Health informatics specialists assisted in querying a cohort of likely PWUD hospital admissions at Tufts Medical Center between 2020-2022 using the following criteria: (1) ICD-10 codes indicative of drug use, (2) positive drug toxicology results, (3) prescriptions for medications for opioid use disorder, and (4) applying NLP-detected presence of "token" keywords in the electronic medical records likely indicative of the patient being a PWUD. Hospital admissions were split into two groups: highly documented (all four criteria present) and minimally documented (NLP-only). These groups were examined to assess the impact of race, ethnicity, and social vulnerability index. With chart review as the "gold standard," the positive predictive value was calculated.

Results: The cohort included 4548 hospitalization admissions, with broad heterogeneity in how people entered the cohort and subcohorts; a total of 288 hospital admissions entered the cohort through NLP token presence alone. NLP demonstrated a 54% positive predictive value, outperforming biomarkers, prescription for medications for opioid use disorder, and ICD codes in identifying hospitalizations of PWUD. Additionally, NLP significantly enhanced these methods when integrated into the identification algorithm. The study also found that people from racially and ethnically minoritized communities and those with lower social vulnerability index were significantly more likely to have lower rates of PWUD-related documentation.

Conclusions: NLP proved effective in identifying hospitalizations of PWUD, surpassing traditional methods. While further refinement is needed, NLP shows promising potential in minimizing health care disparities.

自然语言处理识别住院用药患者:队列研究。
背景:吸毒者(PWUD)发生严重注射相关感染的风险较高。目前的研究依赖于账单代码来识别pwd,这是一种准确度不理想的方法,可能低估了住院pwd的经济、种族和民族多样性。目的:本研究的目的是研究自然语言处理(NLP)对增强电子病历中PWUD识别的影响,特别关注确定改进的系统,以识别以前可能被遗漏的人群,包括低收入人群或来自种族和少数民族的人群。方法:健康信息学专家协助查询Tufts医疗中心2020-2022年期间可能的PWUD住院患者队列,使用以下标准:(1)指示药物使用的ICD-10代码,(2)阳性药物毒理学结果,(3)阿片类药物使用障碍的药物处方,以及(4)应用nlp检测到的电子病历中存在的“令牌”关键字,可能表明患者是PWUD。住院患者分为两组:高度记录(所有四项标准均存在)和最低记录(仅nlp)。对这些群体进行检查,以评估种族、民族和社会脆弱性指数的影响。以图表回顾为“金标准”,计算阳性预测值。结果:该队列包括4548例住院患者,人们进入队列和亚队列的方式存在广泛的异质性;仅通过NLP象征性存在就有288名住院患者进入队列。NLP显示出54%的阳性预测值,优于生物标志物、阿片类药物使用障碍的药物处方和识别PWUD住院的ICD代码。此外,当整合到识别算法中时,NLP显着增强了这些方法。该研究还发现,来自种族和少数民族社区的人以及社会脆弱性指数较低的人更有可能拥有较低的pwd相关文件。结论:NLP在识别PWUD住院情况方面优于传统方法。虽然需要进一步完善,但NLP在减少医疗保健差距方面显示出很大的潜力。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信