Dual-stream algorithms for dementia detection: Harnessing structured and unstructured electronic health record data, a novel approach to prevalence estimation
Taya A. Collyer, Ming Liu, Richard Beare, Nadine E. Andrew, David Ung, Alison Carver, Jenni Ilomaki, J. Simon Bell, Amanda G. Thrift, Walter A. Rocca, Jennifer L. St Sauver, Alicia Lu, Kristy Siostrom, Chris Moran, Helene Roberts, Trevor T.-J. Chong, Anne Murray, Tanya Ravipati, Bridget O'Bree, Velandai K. Srikanth
{"title":"Dual-stream algorithms for dementia detection: Harnessing structured and unstructured electronic health record data, a novel approach to prevalence estimation","authors":"Taya A. Collyer, Ming Liu, Richard Beare, Nadine E. Andrew, David Ung, Alison Carver, Jenni Ilomaki, J. Simon Bell, Amanda G. Thrift, Walter A. Rocca, Jennifer L. St Sauver, Alicia Lu, Kristy Siostrom, Chris Moran, Helene Roberts, Trevor T.-J. Chong, Anne Murray, Tanya Ravipati, Bridget O'Bree, Velandai K. Srikanth","doi":"10.1002/alz.70132","DOIUrl":null,"url":null,"abstract":"<div>\n \n \n <section>\n \n <h3> INTRODUCTION</h3>\n \n <p>Identifying individuals with dementia is crucial for prevalence estimation and service planning, but reliable, scalable methods are lacking. We developed novel set algorithms using both structured and unstructured electronic health record (EHR) data, applying Diagnostic and Statistical Manual of Mental Disorders criteria for dementia case identification.</p>\n </section>\n \n <section>\n \n <h3> METHODS</h3>\n \n <p>Our cohort (<i>n</i> = 1082) included individuals aged ≥ 60 with dementia identified through specialist clinics and a comparison group without dementia. Clinicians from Australia and the United States informed predictor selection. We developed algorithms through a biostatistics stream for structured data and a natural language processing (NLP) stream for text, synthesizing results via logistic regression.</p>\n </section>\n \n <section>\n \n <h3> RESULTS</h3>\n \n <p>The final structured model retained 16 variables (area under the receiver operating characteristic curve [AUC] 0.853, specificity 72.2%, sensitivity 80.6%). NLP classifiers (logistic regression, support vector machine, and random forest models) performed comparably. The final, combined model outperformed all others (AUC = 0.951, <i>P</i> < 0.001 for comparison to structured model).</p>\n </section>\n \n <section>\n \n <h3> DISCUSSION</h3>\n \n <p>Embedding text-derived insights within algorithms trained on structured medical data significantly enhances dementia identification capacity.</p>\n </section>\n \n <section>\n \n <h3> Highlights</h3>\n \n <div>\n <ul>\n \n <li>Algorithmic tools for detection of individuals with dementia are available; however, previous work has used heterogeneous case definitions which are not clinically meaningful, and has relied on proxies such as diagnostic codes or medications for case ascertainment.</li>\n \n <li>We used a novel, dual-stream algorithmic development approach, simultaneously and separately modeling a clinically meaningful outcome (diagnosis of dementia according to specialized clinical impression) using structured and unstructured electronic health record datasets.</li>\n \n <li>Our clinically grounded case definition supported the inclusion of key structured variables (such as dementia International Classification of Disease codes and medications) as modeling predictors rather than outcomes.</li>\n \n <li>Our algorithms, published in detail to support validation and replication, represent a major step forward in the use of routinely collected data for detection of diagnosed dementia.</li>\n </ul>\n </div>\n </section>\n </div>","PeriodicalId":7471,"journal":{"name":"Alzheimer's & Dementia","volume":"21 5","pages":""},"PeriodicalIF":13.0000,"publicationDate":"2025-05-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1002/alz.70132","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Alzheimer's & Dementia","FirstCategoryId":"3","ListUrlMain":"https://onlinelibrary.wiley.com/doi/10.1002/alz.70132","RegionNum":1,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"CLINICAL NEUROLOGY","Score":null,"Total":0}
引用次数: 0
Abstract
INTRODUCTION
Identifying individuals with dementia is crucial for prevalence estimation and service planning, but reliable, scalable methods are lacking. We developed novel set algorithms using both structured and unstructured electronic health record (EHR) data, applying Diagnostic and Statistical Manual of Mental Disorders criteria for dementia case identification.
METHODS
Our cohort (n = 1082) included individuals aged ≥ 60 with dementia identified through specialist clinics and a comparison group without dementia. Clinicians from Australia and the United States informed predictor selection. We developed algorithms through a biostatistics stream for structured data and a natural language processing (NLP) stream for text, synthesizing results via logistic regression.
RESULTS
The final structured model retained 16 variables (area under the receiver operating characteristic curve [AUC] 0.853, specificity 72.2%, sensitivity 80.6%). NLP classifiers (logistic regression, support vector machine, and random forest models) performed comparably. The final, combined model outperformed all others (AUC = 0.951, P < 0.001 for comparison to structured model).
DISCUSSION
Embedding text-derived insights within algorithms trained on structured medical data significantly enhances dementia identification capacity.
Highlights
Algorithmic tools for detection of individuals with dementia are available; however, previous work has used heterogeneous case definitions which are not clinically meaningful, and has relied on proxies such as diagnostic codes or medications for case ascertainment.
We used a novel, dual-stream algorithmic development approach, simultaneously and separately modeling a clinically meaningful outcome (diagnosis of dementia according to specialized clinical impression) using structured and unstructured electronic health record datasets.
Our clinically grounded case definition supported the inclusion of key structured variables (such as dementia International Classification of Disease codes and medications) as modeling predictors rather than outcomes.
Our algorithms, published in detail to support validation and replication, represent a major step forward in the use of routinely collected data for detection of diagnosed dementia.
期刊介绍:
Alzheimer's & Dementia is a peer-reviewed journal that aims to bridge knowledge gaps in dementia research by covering the entire spectrum, from basic science to clinical trials to social and behavioral investigations. It provides a platform for rapid communication of new findings and ideas, optimal translation of research into practical applications, increasing knowledge across diverse disciplines for early detection, diagnosis, and intervention, and identifying promising new research directions. In July 2008, Alzheimer's & Dementia was accepted for indexing by MEDLINE, recognizing its scientific merit and contribution to Alzheimer's research.