Extracting Critical Information from Unstructured Clinicians' Notes Data to Identify Dementia Severity Using a Rule-Based Approach: Feasibility Study.

IF 5 Q1 GERIATRICS & GERONTOLOGY
JMIR Aging Pub Date : 2024-09-24 DOI:10.2196/57926
Ravi Prakash, Matthew E Dupre, Truls Østbye, Hanzhang Xu
{"title":"Extracting Critical Information from Unstructured Clinicians' Notes Data to Identify Dementia Severity Using a Rule-Based Approach: Feasibility Study.","authors":"Ravi Prakash, Matthew E Dupre, Truls Østbye, Hanzhang Xu","doi":"10.2196/57926","DOIUrl":null,"url":null,"abstract":"<p><strong>Background: </strong>The severity of Alzheimer disease and related dementias (ADRD) is rarely documented in structured data fields in electronic health records (EHRs). Although this information is important for clinical monitoring and decision-making, it is often undocumented or \"hidden\" in unstructured text fields and not readily available for clinicians to act upon.</p><p><strong>Objective: </strong>We aimed to assess the feasibility and potential bias in using keywords and rule-based matching for obtaining information about the severity of ADRD from EHR data.</p><p><strong>Methods: </strong>We used EHR data from a large academic health care system that included patients with a primary discharge diagnosis of ADRD based on ICD-9 (International Classification of Diseases, Ninth Revision) and ICD-10 (International Statistical Classification of Diseases, Tenth Revision) codes between 2014 and 2019. We first assessed the presence of ADRD severity information and then the severity of ADRD in the EHR. Clinicians' notes were used to determine the severity of ADRD based on two criteria: (1) scores from the Mini Mental State Examination and Montreal Cognitive Assessment and (2) explicit terms for ADRD severity (eg, \"mild dementia\" and \"advanced Alzheimer disease\"). We compiled a list of common ADRD symptoms, cognitive test names, and disease severity terms, refining it iteratively based on previous literature and clinical expertise. Subsequently, we used rule-based matching in Python using standard open-source data analysis libraries to identify the context in which specific words or phrases were mentioned. We estimated the prevalence of documented ADRD severity and assessed the performance of our rule-based algorithm.</p><p><strong>Results: </strong>We included 9115 eligible patients with over 65,000 notes from the providers. Overall, 22.93% (2090/9115) of patients were documented with mild ADRD, 20.87% (1902/9115) were documented with moderate or severe ADRD, and 56.20% (5123/9115) did not have any documentation of the severity of their ADRD. For the task of determining the presence of any ADRD severity information, our algorithm achieved an accuracy of >95%, specificity of >95%, sensitivity of >90%, and an F<sub>1</sub>-score of >83%. For the specific task of identifying the actual severity of ADRD, the algorithm performed well with an accuracy of >91%, specificity of >80%, sensitivity of >88%, and F<sub>1</sub>-score of >92%. Comparing patients with mild ADRD to those with more advanced ADRD, the latter group tended to contain older, more likely female, and Black patients, and having received their diagnoses in primary care or in-hospital settings. Relative to patients with undocumented ADRD severity, those with documented ADRD severity had a similar distribution in terms of sex, race, and rural or urban residence.</p><p><strong>Conclusions: </strong>Our study demonstrates the feasibility of using a rule-based matching algorithm to identify ADRD severity from unstructured EHR report data. However, it is essential to acknowledge potential biases arising from differences in documentation practices across various health care systems.</p>","PeriodicalId":36245,"journal":{"name":"JMIR Aging","volume":"7 ","pages":"e57926"},"PeriodicalIF":5.0000,"publicationDate":"2024-09-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11462099/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"JMIR Aging","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.2196/57926","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"GERIATRICS & GERONTOLOGY","Score":null,"Total":0}
引用次数: 0

Abstract

Background: The severity of Alzheimer disease and related dementias (ADRD) is rarely documented in structured data fields in electronic health records (EHRs). Although this information is important for clinical monitoring and decision-making, it is often undocumented or "hidden" in unstructured text fields and not readily available for clinicians to act upon.

Objective: We aimed to assess the feasibility and potential bias in using keywords and rule-based matching for obtaining information about the severity of ADRD from EHR data.

Methods: We used EHR data from a large academic health care system that included patients with a primary discharge diagnosis of ADRD based on ICD-9 (International Classification of Diseases, Ninth Revision) and ICD-10 (International Statistical Classification of Diseases, Tenth Revision) codes between 2014 and 2019. We first assessed the presence of ADRD severity information and then the severity of ADRD in the EHR. Clinicians' notes were used to determine the severity of ADRD based on two criteria: (1) scores from the Mini Mental State Examination and Montreal Cognitive Assessment and (2) explicit terms for ADRD severity (eg, "mild dementia" and "advanced Alzheimer disease"). We compiled a list of common ADRD symptoms, cognitive test names, and disease severity terms, refining it iteratively based on previous literature and clinical expertise. Subsequently, we used rule-based matching in Python using standard open-source data analysis libraries to identify the context in which specific words or phrases were mentioned. We estimated the prevalence of documented ADRD severity and assessed the performance of our rule-based algorithm.

Results: We included 9115 eligible patients with over 65,000 notes from the providers. Overall, 22.93% (2090/9115) of patients were documented with mild ADRD, 20.87% (1902/9115) were documented with moderate or severe ADRD, and 56.20% (5123/9115) did not have any documentation of the severity of their ADRD. For the task of determining the presence of any ADRD severity information, our algorithm achieved an accuracy of >95%, specificity of >95%, sensitivity of >90%, and an F1-score of >83%. For the specific task of identifying the actual severity of ADRD, the algorithm performed well with an accuracy of >91%, specificity of >80%, sensitivity of >88%, and F1-score of >92%. Comparing patients with mild ADRD to those with more advanced ADRD, the latter group tended to contain older, more likely female, and Black patients, and having received their diagnoses in primary care or in-hospital settings. Relative to patients with undocumented ADRD severity, those with documented ADRD severity had a similar distribution in terms of sex, race, and rural or urban residence.

Conclusions: Our study demonstrates the feasibility of using a rule-based matching algorithm to identify ADRD severity from unstructured EHR report data. However, it is essential to acknowledge potential biases arising from differences in documentation practices across various health care systems.

从非结构化临床医生笔记数据中提取关键信息,利用基于规则的方法识别痴呆症严重程度:可行性研究。
背景:阿尔茨海默病及相关痴呆症(ADRD)的严重程度很少记录在电子病历(EHR)的结构化数据字段中。虽然这些信息对临床监测和决策非常重要,但它们往往没有记录或 "隐藏 "在非结构化文本字段中,临床医生无法随时采取行动:我们旨在评估使用关键词和基于规则的匹配从电子病历数据中获取 ADRD 严重程度信息的可行性和潜在偏差:我们使用了一个大型学术医疗保健系统的电子病历数据,这些数据包括 2014 年至 2019 年期间根据 ICD-9(国际疾病分类,第九版)和 ICD-10(国际疾病统计分类,第十版)代码主要出院诊断为 ADRD 的患者。我们首先评估了电子病历中是否存在 ADRD 严重程度信息,然后评估了 ADRD 的严重程度。我们根据临床医生的记录来确定 ADRD 的严重程度,该记录基于两个标准:(1) Mini Mental State Examination(迷你精神状态检查)和 Montreal Cognitive Assessment(蒙特利尔认知评估)的评分;(2) ADRD 严重程度的明确术语(如 "轻度痴呆 "和 "晚期阿尔茨海默病")。我们编制了一份常见 ADRD 症状、认知测试名称和疾病严重程度术语的列表,并根据以往的文献和临床专业知识反复进行完善。随后,我们在 Python 中使用标准开源数据分析库进行基于规则的匹配,以确定提及特定单词或短语的上下文。我们估算了有记录的 ADRD 严重程度的患病率,并评估了基于规则的算法的性能:我们纳入了 9115 名符合条件的患者,他们的医疗服务提供者提供了 65,000 多份记录。总体而言,22.93%(2090/9115)的患者记录有轻度 ADRD,20.87%(1902/9115)的患者记录有中度或重度 ADRD,56.20%(5123/9115)的患者没有任何关于 ADRD 严重程度的记录。在确定是否存在任何 ADRD 严重程度信息的任务中,我们算法的准确性>95%,特异性>95%,灵敏度>90%,F1 分数>83%。在识别 ADRD 实际严重程度的特定任务中,该算法表现出色,准确率大于 91%,特异性大于 80%,灵敏度大于 88%,F1 分数大于 92%。将轻度 ADRD 患者与晚期 ADRD 患者进行比较,发现晚期 ADRD 患者往往年龄较大,更可能是女性和黑人,并且是在初级保健或医院环境中得到诊断的。与未记录 ADRD 严重程度的患者相比,记录了 ADRD 严重程度的患者在性别、种族、农村或城市居住地方面的分布相似:我们的研究证明了使用基于规则的匹配算法从非结构化电子病历报告数据中识别 ADRD 严重程度的可行性。然而,我们必须认识到不同医疗系统的记录方法不同可能会造成偏差。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
JMIR Aging
JMIR Aging Social Sciences-Health (social science)
CiteScore
6.50
自引率
4.10%
发文量
71
审稿时长
12 weeks
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信