Betina Idnay, Gongbo Zhang, Fangyi Chen, Casey N Ta, Matthew W Schelke, Karen Marder, Chunhua Weng
{"title":"Mini-mental status examination phenotyping for Alzheimer's disease patients using both structured and narrative electronic health record features.","authors":"Betina Idnay, Gongbo Zhang, Fangyi Chen, Casey N Ta, Matthew W Schelke, Karen Marder, Chunhua Weng","doi":"10.1093/jamia/ocae274","DOIUrl":null,"url":null,"abstract":"<p><strong>Objective: </strong>This study aims to automate the prediction of Mini-Mental State Examination (MMSE) scores, a widely adopted standard for cognitive assessment in patients with Alzheimer's disease, using natural language processing (NLP) and machine learning (ML) on structured and unstructured EHR data.</p><p><strong>Materials and methods: </strong>We extracted demographic data, diagnoses, medications, and unstructured clinical visit notes from the EHRs. We used Latent Dirichlet Allocation (LDA) for topic modeling and Term-Frequency Inverse Document Frequency (TF-IDF) for n-grams. In addition, we extracted meta-features such as age, ethnicity, and race. Model training and evaluation employed eXtreme Gradient Boosting (XGBoost), Stochastic Gradient Descent Regressor (SGDRegressor), and Multi-Layer Perceptron (MLP).</p><p><strong>Results: </strong>We analyzed 1654 clinical visit notes collected between September 2019 and June 2023 for 1000 Alzheimer's disease patients. The average MMSE score was 20, with patients averaging 76.4 years old, 54.7% female, and 54.7% identifying as White. The best-performing model (ie, lowest root mean squared error (RMSE)) is MLP, which achieved an RMSE of 5.53 on the validation set using n-grams, indicating superior prediction performance over other models and feature sets. The RMSE on the test set was 5.85.</p><p><strong>Discussion: </strong>This study developed a ML method to predict MMSE scores from unstructured clinical notes, demonstrating the feasibility of utilizing NLP to support cognitive assessment. Future work should focus on refining the model and evaluating its clinical relevance across diverse settings.</p><p><strong>Conclusion: </strong>We contributed a model for automating MMSE estimation using EHR features, potentially transforming cognitive assessment for Alzheimer's patients and paving the way for more informed clinical decisions and cohort identification.</p>","PeriodicalId":50016,"journal":{"name":"Journal of the American Medical Informatics Association","volume":" ","pages":""},"PeriodicalIF":4.7000,"publicationDate":"2024-11-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of the American Medical Informatics Association","FirstCategoryId":"91","ListUrlMain":"https://doi.org/10.1093/jamia/ocae274","RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}
引用次数: 0
Abstract
Objective: This study aims to automate the prediction of Mini-Mental State Examination (MMSE) scores, a widely adopted standard for cognitive assessment in patients with Alzheimer's disease, using natural language processing (NLP) and machine learning (ML) on structured and unstructured EHR data.
Materials and methods: We extracted demographic data, diagnoses, medications, and unstructured clinical visit notes from the EHRs. We used Latent Dirichlet Allocation (LDA) for topic modeling and Term-Frequency Inverse Document Frequency (TF-IDF) for n-grams. In addition, we extracted meta-features such as age, ethnicity, and race. Model training and evaluation employed eXtreme Gradient Boosting (XGBoost), Stochastic Gradient Descent Regressor (SGDRegressor), and Multi-Layer Perceptron (MLP).
Results: We analyzed 1654 clinical visit notes collected between September 2019 and June 2023 for 1000 Alzheimer's disease patients. The average MMSE score was 20, with patients averaging 76.4 years old, 54.7% female, and 54.7% identifying as White. The best-performing model (ie, lowest root mean squared error (RMSE)) is MLP, which achieved an RMSE of 5.53 on the validation set using n-grams, indicating superior prediction performance over other models and feature sets. The RMSE on the test set was 5.85.
Discussion: This study developed a ML method to predict MMSE scores from unstructured clinical notes, demonstrating the feasibility of utilizing NLP to support cognitive assessment. Future work should focus on refining the model and evaluating its clinical relevance across diverse settings.
Conclusion: We contributed a model for automating MMSE estimation using EHR features, potentially transforming cognitive assessment for Alzheimer's patients and paving the way for more informed clinical decisions and cohort identification.
期刊介绍:
JAMIA is AMIA''s premier peer-reviewed journal for biomedical and health informatics. Covering the full spectrum of activities in the field, JAMIA includes informatics articles in the areas of clinical care, clinical research, translational science, implementation science, imaging, education, consumer health, public health, and policy. JAMIA''s articles describe innovative informatics research and systems that help to advance biomedical science and to promote health. Case reports, perspectives and reviews also help readers stay connected with the most important informatics developments in implementation, policy and education.