Extracting Cognitive Impairment Assessment Information From Unstructured Notes in Electronic Health Records Using Natural Language Processing Tools: Validation with Clinical Assessment Data.
IF 3.4 2区 医学Q1 PUBLIC, ENVIRONMENTAL & OCCUPATIONAL HEALTH
Kuan-Yuan Wang, Mufaddal Mahesri, John Novoa-Laurentiev, Lily G Bessette, Cassandra York, Heidi Zakoul, Su Been Lee, Kerry Ngan, Li Zhou, Dae Hyun Kim, Kueiyu Joshua Lin
{"title":"Extracting Cognitive Impairment Assessment Information From Unstructured Notes in Electronic Health Records Using Natural Language Processing Tools: Validation with Clinical Assessment Data.","authors":"Kuan-Yuan Wang, Mufaddal Mahesri, John Novoa-Laurentiev, Lily G Bessette, Cassandra York, Heidi Zakoul, Su Been Lee, Kerry Ngan, Li Zhou, Dae Hyun Kim, Kueiyu Joshua Lin","doi":"10.2147/CLEP.S504259","DOIUrl":null,"url":null,"abstract":"<p><strong>Purpose: </strong>We aimed to develop a Natural Language Processing (NLP) algorithm to extract cognitive scores from electronic health records (EHR) data and compare them with cognitive function recorded by Centers for Medicare & Medicaid Services (CMS)-mandated clinical assessments in nursing homes and home health visits.</p><p><strong>Patients and methods: </strong>We identified a cohort of Medicare beneficiaries who had either the Minimum Data Set (MDS) or Outcome and Assessment Information Set (OASIS) linked to EHR data from the Research Patient Data Registry (Mass General Brigham, Boston, MA) from 2010 to 2019. We applied an NLP approach to identify the Montreal Cognitive Assessment (MoCA) and the Mini-Mental State Examination (MMSE) scores from unstructured clinician notes in EHR. Using the NLP-extracted MoCA or MMSE scores from EHR, we compared mean differences of extracted MoCA or MMSE by cognition status determined by MDS (impaired vs intact cognition) and OASIS (severe impairment vs intact cognition) data, respectively.</p><p><strong>Results: </strong>Our study cohort had 7419 patients who had MDS (19.7%) or OASIS (80.3%) assessments, with a mean age of 80 (SD=7) years and 60% female. In EHR, the NLP algorithm extracted cognitive test scores with 97% accuracy (95% CI: 92-99%) for MoCA and 100% accuracy (95% CI: 84-100%) for MMSE. In MDS, the mean difference in extracted MoCA was -5.6 (95% CI: -8.7, -2.4, p=0.0008), and the mean difference in extracted MMSE was -7.9 (95% CI: -12.4, -3.5, p=0.0012). In OASIS, the mean difference in extracted MoCA and extracted MMSE was -4.8 (95% CI: -9.1, -0.6, p=0.0006) and -4.5 (95% CI: -9.5, -0.5, p=0.0182), respectively.</p><p><strong>Conclusion: </strong>We developed an NLP algorithm to accurately extract cognitive scores from unstructured EHR, and these extracted cognitive scores were well correlated with cognition function recorded in CMS-mandated clinical assessments. This could help researchers identify patients with various degrees of cognitive impairment in EHR-based research.</p>","PeriodicalId":10362,"journal":{"name":"Clinical Epidemiology","volume":"17 ","pages":"353-365"},"PeriodicalIF":3.4000,"publicationDate":"2025-04-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12009745/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Clinical Epidemiology","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.2147/CLEP.S504259","RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2025/1/1 0:00:00","PubModel":"eCollection","JCR":"Q1","JCRName":"PUBLIC, ENVIRONMENTAL & OCCUPATIONAL HEALTH","Score":null,"Total":0}
引用次数: 0
Abstract
Purpose: We aimed to develop a Natural Language Processing (NLP) algorithm to extract cognitive scores from electronic health records (EHR) data and compare them with cognitive function recorded by Centers for Medicare & Medicaid Services (CMS)-mandated clinical assessments in nursing homes and home health visits.
Patients and methods: We identified a cohort of Medicare beneficiaries who had either the Minimum Data Set (MDS) or Outcome and Assessment Information Set (OASIS) linked to EHR data from the Research Patient Data Registry (Mass General Brigham, Boston, MA) from 2010 to 2019. We applied an NLP approach to identify the Montreal Cognitive Assessment (MoCA) and the Mini-Mental State Examination (MMSE) scores from unstructured clinician notes in EHR. Using the NLP-extracted MoCA or MMSE scores from EHR, we compared mean differences of extracted MoCA or MMSE by cognition status determined by MDS (impaired vs intact cognition) and OASIS (severe impairment vs intact cognition) data, respectively.
Results: Our study cohort had 7419 patients who had MDS (19.7%) or OASIS (80.3%) assessments, with a mean age of 80 (SD=7) years and 60% female. In EHR, the NLP algorithm extracted cognitive test scores with 97% accuracy (95% CI: 92-99%) for MoCA and 100% accuracy (95% CI: 84-100%) for MMSE. In MDS, the mean difference in extracted MoCA was -5.6 (95% CI: -8.7, -2.4, p=0.0008), and the mean difference in extracted MMSE was -7.9 (95% CI: -12.4, -3.5, p=0.0012). In OASIS, the mean difference in extracted MoCA and extracted MMSE was -4.8 (95% CI: -9.1, -0.6, p=0.0006) and -4.5 (95% CI: -9.5, -0.5, p=0.0182), respectively.
Conclusion: We developed an NLP algorithm to accurately extract cognitive scores from unstructured EHR, and these extracted cognitive scores were well correlated with cognition function recorded in CMS-mandated clinical assessments. This could help researchers identify patients with various degrees of cognitive impairment in EHR-based research.
期刊介绍:
Clinical Epidemiology is an international, peer reviewed, open access journal. Clinical Epidemiology focuses on the application of epidemiological principles and questions relating to patients and clinical care in terms of prevention, diagnosis, prognosis, and treatment.
Clinical Epidemiology welcomes papers covering these topics in form of original research and systematic reviews.
Clinical Epidemiology has a special interest in international electronic medical patient records and other routine health care data, especially as applied to safety of medical interventions, clinical utility of diagnostic procedures, understanding short- and long-term clinical course of diseases, clinical epidemiological and biostatistical methods, and systematic reviews.
When considering submission of a paper utilizing publicly-available data, authors should ensure that such studies add significantly to the body of knowledge and that they use appropriate validated methods for identifying health outcomes.
The journal has launched special series describing existing data sources for clinical epidemiology, international health care systems and validation studies of algorithms based on databases and registries.