Application of natural language processing to identify social needs from patient medical notes: development and assessment of a scalable, performant, and rule-based model in an integrated healthcare delivery system.

IF 2.5 Q2 HEALTH CARE SCIENCES & SERVICES
JAMIA Open Pub Date : 2023-10-04 eCollection Date: 2023-12-01 DOI:10.1093/jamiaopen/ooad085
Geoffrey M Gray, Ayah Zirikly, Luis M Ahumada, Masoud Rouhizadeh, Thomas Richards, Christopher Kitchen, Iman Foroughmand, Elham Hatef
{"title":"Application of natural language processing to identify social needs from patient medical notes: development and assessment of a scalable, performant, and rule-based model in an integrated healthcare delivery system.","authors":"Geoffrey M Gray, Ayah Zirikly, Luis M Ahumada, Masoud Rouhizadeh, Thomas Richards, Christopher Kitchen, Iman Foroughmand, Elham Hatef","doi":"10.1093/jamiaopen/ooad085","DOIUrl":null,"url":null,"abstract":"<p><strong>Objectives: </strong>To develop and test a scalable, performant, and rule-based model for identifying 3 major domains of social needs (residential instability, food insecurity, and transportation issues) from the unstructured data in electronic health records (EHRs).</p><p><strong>Materials and methods: </strong>We included patients aged 18 years or older who received care at the Johns Hopkins Health System (JHHS) between July 2016 and June 2021 and had at least 1 unstructured (free-text) note in their EHR during the study period. We used a combination of manual lexicon curation and semiautomated lexicon creation for feature development. We developed an initial rules-based pipeline (Match Pipeline) using 2 keyword sets for each social needs domain. We performed rule-based keyword matching for distinct lexicons and tested the algorithm using an annotated dataset comprising 192 patients. Starting with a set of expert-identified keywords, we tested the adjustments by evaluating false positives and negatives identified in the labeled dataset. We assessed the performance of the algorithm using measures of precision, recall, and <i>F</i>1 score.</p><p><strong>Results: </strong>The algorithm for identifying residential instability had the best overall performance, with a weighted average for precision, recall, and <i>F</i>1 score of 0.92, 0.84, and 0.92 for identifying patients with homelessness and 0.84, 0.82, and 0.79 for identifying patients with housing insecurity. Metrics for the food insecurity algorithm were high but the transportation issues algorithm was the lowest overall performing metric.</p><p><strong>Discussion: </strong>The NLP algorithm in identifying social needs at JHHS performed relatively well and would provide the opportunity for implementation in a healthcare system.</p><p><strong>Conclusion: </strong>The NLP approach developed in this project could be adapted and potentially operationalized in the routine data processes of a healthcare system.</p>","PeriodicalId":36278,"journal":{"name":"JAMIA Open","volume":"6 4","pages":"ooad085"},"PeriodicalIF":2.5000,"publicationDate":"2023-10-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ftp.ncbi.nlm.nih.gov/pub/pmc/oa_pdf/2e/eb/ooad085.PMC10550267.pdf","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"JAMIA Open","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1093/jamiaopen/ooad085","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2023/12/1 0:00:00","PubModel":"eCollection","JCR":"Q2","JCRName":"HEALTH CARE SCIENCES & SERVICES","Score":null,"Total":0}
引用次数: 0

Abstract

Objectives: To develop and test a scalable, performant, and rule-based model for identifying 3 major domains of social needs (residential instability, food insecurity, and transportation issues) from the unstructured data in electronic health records (EHRs).

Materials and methods: We included patients aged 18 years or older who received care at the Johns Hopkins Health System (JHHS) between July 2016 and June 2021 and had at least 1 unstructured (free-text) note in their EHR during the study period. We used a combination of manual lexicon curation and semiautomated lexicon creation for feature development. We developed an initial rules-based pipeline (Match Pipeline) using 2 keyword sets for each social needs domain. We performed rule-based keyword matching for distinct lexicons and tested the algorithm using an annotated dataset comprising 192 patients. Starting with a set of expert-identified keywords, we tested the adjustments by evaluating false positives and negatives identified in the labeled dataset. We assessed the performance of the algorithm using measures of precision, recall, and F1 score.

Results: The algorithm for identifying residential instability had the best overall performance, with a weighted average for precision, recall, and F1 score of 0.92, 0.84, and 0.92 for identifying patients with homelessness and 0.84, 0.82, and 0.79 for identifying patients with housing insecurity. Metrics for the food insecurity algorithm were high but the transportation issues algorithm was the lowest overall performing metric.

Discussion: The NLP algorithm in identifying social needs at JHHS performed relatively well and would provide the opportunity for implementation in a healthcare system.

Conclusion: The NLP approach developed in this project could be adapted and potentially operationalized in the routine data processes of a healthcare system.

Abstract Image

Abstract Image

应用自然语言处理从患者病历中识别社会需求:在综合医疗服务提供系统中开发和评估可扩展、高效和基于规则的模型。
目标:开发和测试一种可扩展的、高性能的,和基于规则的模型,用于从电子健康记录(EHR)中的非结构化数据中识别社会需求的3个主要领域(居住不稳定、粮食不安全和交通问题)。材料和方法:我们纳入了2016年7月至2021年6月在约翰斯·霍普金斯卫生系统(JHHS)接受护理的18岁或以上患者,他们至少有1名非结构化(自由文本)研究期间EHR中的注释。我们使用了手动词典管理和半自动词典创建相结合的方法来开发功能。我们开发了一个初始的基于规则的管道(Match pipeline),为每个社会需求领域使用2个关键字集。我们对不同的词典进行了基于规则的关键词匹配,并使用包含192名患者的注释数据集测试了该算法。从一组专家识别的关键词开始,我们通过评估标记数据集中识别的假阳性和阴性来测试调整。我们使用精度、召回率和F1分数来评估算法的性能。结果:用于识别居住不稳定的算法具有最佳的总体性能,用于识别无家可归患者的精确度、召回率和F1得分的加权平均值分别为0.92、0.84和0.92,用于识别住房不安全患者的加权平均数分别为0.84、0.82和0.79。粮食不安全算法的指标很高,但运输问题算法是总体表现最低的指标。讨论:在JHHS识别社会需求的NLP算法表现相对较好,将为在医疗系统中实施提供机会。结论:该项目中开发的NLP方法可以在医疗保健系统的常规数据过程中进行调整,并有可能付诸实施。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
JAMIA Open
JAMIA Open Medicine-Health Informatics
CiteScore
4.10
自引率
4.80%
发文量
102
审稿时长
16 weeks
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信