Taya A. Collyer, Ming Liu, Richard Beare, Nadine E. Andrew, David Ung, Alison Carver, Jenni Ilomaki, J. Simon Bell, Amanda G. Thrift, Walter A. Rocca, Jennifer L. St Sauver, Alicia Lu, Kristy Siostrom, Chris Moran, Helene Roberts, Trevor T.-J. Chong, Anne Murray, Tanya Ravipati, Bridget O'Bree, Velandai K. Srikanth
{"title":"痴呆检测的双流算法:利用结构化和非结构化电子健康记录数据,一种患病率估计的新方法","authors":"Taya A. Collyer, Ming Liu, Richard Beare, Nadine E. Andrew, David Ung, Alison Carver, Jenni Ilomaki, J. Simon Bell, Amanda G. Thrift, Walter A. Rocca, Jennifer L. St Sauver, Alicia Lu, Kristy Siostrom, Chris Moran, Helene Roberts, Trevor T.-J. Chong, Anne Murray, Tanya Ravipati, Bridget O'Bree, Velandai K. Srikanth","doi":"10.1002/alz.70132","DOIUrl":null,"url":null,"abstract":"<div>\n \n \n <section>\n \n <h3> INTRODUCTION</h3>\n \n <p>Identifying individuals with dementia is crucial for prevalence estimation and service planning, but reliable, scalable methods are lacking. We developed novel set algorithms using both structured and unstructured electronic health record (EHR) data, applying Diagnostic and Statistical Manual of Mental Disorders criteria for dementia case identification.</p>\n </section>\n \n <section>\n \n <h3> METHODS</h3>\n \n <p>Our cohort (<i>n</i> = 1082) included individuals aged ≥ 60 with dementia identified through specialist clinics and a comparison group without dementia. Clinicians from Australia and the United States informed predictor selection. We developed algorithms through a biostatistics stream for structured data and a natural language processing (NLP) stream for text, synthesizing results via logistic regression.</p>\n </section>\n \n <section>\n \n <h3> RESULTS</h3>\n \n <p>The final structured model retained 16 variables (area under the receiver operating characteristic curve [AUC] 0.853, specificity 72.2%, sensitivity 80.6%). NLP classifiers (logistic regression, support vector machine, and random forest models) performed comparably. The final, combined model outperformed all others (AUC = 0.951, <i>P</i> < 0.001 for comparison to structured model).</p>\n </section>\n \n <section>\n \n <h3> DISCUSSION</h3>\n \n <p>Embedding text-derived insights within algorithms trained on structured medical data significantly enhances dementia identification capacity.</p>\n </section>\n \n <section>\n \n <h3> Highlights</h3>\n \n <div>\n <ul>\n \n <li>Algorithmic tools for detection of individuals with dementia are available; however, previous work has used heterogeneous case definitions which are not clinically meaningful, and has relied on proxies such as diagnostic codes or medications for case ascertainment.</li>\n \n <li>We used a novel, dual-stream algorithmic development approach, simultaneously and separately modeling a clinically meaningful outcome (diagnosis of dementia according to specialized clinical impression) using structured and unstructured electronic health record datasets.</li>\n \n <li>Our clinically grounded case definition supported the inclusion of key structured variables (such as dementia International Classification of Disease codes and medications) as modeling predictors rather than outcomes.</li>\n \n <li>Our algorithms, published in detail to support validation and replication, represent a major step forward in the use of routinely collected data for detection of diagnosed dementia.</li>\n </ul>\n </div>\n </section>\n </div>","PeriodicalId":7471,"journal":{"name":"Alzheimer's & Dementia","volume":"21 5","pages":""},"PeriodicalIF":13.0000,"publicationDate":"2025-05-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1002/alz.70132","citationCount":"0","resultStr":"{\"title\":\"Dual-stream algorithms for dementia detection: Harnessing structured and unstructured electronic health record data, a novel approach to prevalence estimation\",\"authors\":\"Taya A. Collyer, Ming Liu, Richard Beare, Nadine E. Andrew, David Ung, Alison Carver, Jenni Ilomaki, J. Simon Bell, Amanda G. Thrift, Walter A. Rocca, Jennifer L. St Sauver, Alicia Lu, Kristy Siostrom, Chris Moran, Helene Roberts, Trevor T.-J. Chong, Anne Murray, Tanya Ravipati, Bridget O'Bree, Velandai K. Srikanth\",\"doi\":\"10.1002/alz.70132\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div>\\n \\n \\n <section>\\n \\n <h3> INTRODUCTION</h3>\\n \\n <p>Identifying individuals with dementia is crucial for prevalence estimation and service planning, but reliable, scalable methods are lacking. We developed novel set algorithms using both structured and unstructured electronic health record (EHR) data, applying Diagnostic and Statistical Manual of Mental Disorders criteria for dementia case identification.</p>\\n </section>\\n \\n <section>\\n \\n <h3> METHODS</h3>\\n \\n <p>Our cohort (<i>n</i> = 1082) included individuals aged ≥ 60 with dementia identified through specialist clinics and a comparison group without dementia. Clinicians from Australia and the United States informed predictor selection. We developed algorithms through a biostatistics stream for structured data and a natural language processing (NLP) stream for text, synthesizing results via logistic regression.</p>\\n </section>\\n \\n <section>\\n \\n <h3> RESULTS</h3>\\n \\n <p>The final structured model retained 16 variables (area under the receiver operating characteristic curve [AUC] 0.853, specificity 72.2%, sensitivity 80.6%). NLP classifiers (logistic regression, support vector machine, and random forest models) performed comparably. The final, combined model outperformed all others (AUC = 0.951, <i>P</i> < 0.001 for comparison to structured model).</p>\\n </section>\\n \\n <section>\\n \\n <h3> DISCUSSION</h3>\\n \\n <p>Embedding text-derived insights within algorithms trained on structured medical data significantly enhances dementia identification capacity.</p>\\n </section>\\n \\n <section>\\n \\n <h3> Highlights</h3>\\n \\n <div>\\n <ul>\\n \\n <li>Algorithmic tools for detection of individuals with dementia are available; however, previous work has used heterogeneous case definitions which are not clinically meaningful, and has relied on proxies such as diagnostic codes or medications for case ascertainment.</li>\\n \\n <li>We used a novel, dual-stream algorithmic development approach, simultaneously and separately modeling a clinically meaningful outcome (diagnosis of dementia according to specialized clinical impression) using structured and unstructured electronic health record datasets.</li>\\n \\n <li>Our clinically grounded case definition supported the inclusion of key structured variables (such as dementia International Classification of Disease codes and medications) as modeling predictors rather than outcomes.</li>\\n \\n <li>Our algorithms, published in detail to support validation and replication, represent a major step forward in the use of routinely collected data for detection of diagnosed dementia.</li>\\n </ul>\\n </div>\\n </section>\\n </div>\",\"PeriodicalId\":7471,\"journal\":{\"name\":\"Alzheimer's & Dementia\",\"volume\":\"21 5\",\"pages\":\"\"},\"PeriodicalIF\":13.0000,\"publicationDate\":\"2025-05-05\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://onlinelibrary.wiley.com/doi/epdf/10.1002/alz.70132\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Alzheimer's & Dementia\",\"FirstCategoryId\":\"3\",\"ListUrlMain\":\"https://onlinelibrary.wiley.com/doi/10.1002/alz.70132\",\"RegionNum\":1,\"RegionCategory\":\"医学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"CLINICAL NEUROLOGY\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Alzheimer's & Dementia","FirstCategoryId":"3","ListUrlMain":"https://onlinelibrary.wiley.com/doi/10.1002/alz.70132","RegionNum":1,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"CLINICAL NEUROLOGY","Score":null,"Total":0}
引用次数: 0
摘要
识别痴呆症患者对于患病率估计和服务规划至关重要,但缺乏可靠、可扩展的方法。我们使用结构化和非结构化电子健康记录(EHR)数据开发了新的集算法,应用精神疾病诊断和统计手册的痴呆病例识别标准。方法:我们的队列(n = 1082)包括通过专科诊所确诊的年龄≥60岁的痴呆患者和无痴呆的对照组。来自澳大利亚和美国的临床医生为预测器的选择提供了信息。我们通过结构化数据的生物统计学流和文本的自然语言处理(NLP)流开发算法,并通过逻辑回归综合结果。结果最终建立的结构化模型保留了16个变量(受试者工作特征曲线下面积[AUC] 0.853,特异性72.2%,敏感性80.6%)。NLP分类器(逻辑回归、支持向量机和随机森林模型)的表现相当。最终,联合模型优于所有其他模型(AUC = 0.951, P <;与结构化模型比较为0.001)。在结构化医疗数据训练的算法中嵌入文本衍生的见解可显著提高痴呆症识别能力。现有用于检测痴呆症患者的算法工具;然而,以前的工作使用了不具有临床意义的异质病例定义,并且依赖于诊断代码或药物等代理来确定病例。我们使用了一种新颖的双流算法开发方法,使用结构化和非结构化电子健康记录数据集同时和单独建模临床有意义的结果(根据专业临床印象诊断痴呆)。我们基于临床的病例定义支持将关键结构化变量(如痴呆症国际疾病分类代码和药物)作为建模预测因子而不是结果。我们的算法,详细发表以支持验证和复制,代表了在使用常规收集的数据来检测诊断痴呆方面向前迈出的重要一步。
Dual-stream algorithms for dementia detection: Harnessing structured and unstructured electronic health record data, a novel approach to prevalence estimation
INTRODUCTION
Identifying individuals with dementia is crucial for prevalence estimation and service planning, but reliable, scalable methods are lacking. We developed novel set algorithms using both structured and unstructured electronic health record (EHR) data, applying Diagnostic and Statistical Manual of Mental Disorders criteria for dementia case identification.
METHODS
Our cohort (n = 1082) included individuals aged ≥ 60 with dementia identified through specialist clinics and a comparison group without dementia. Clinicians from Australia and the United States informed predictor selection. We developed algorithms through a biostatistics stream for structured data and a natural language processing (NLP) stream for text, synthesizing results via logistic regression.
RESULTS
The final structured model retained 16 variables (area under the receiver operating characteristic curve [AUC] 0.853, specificity 72.2%, sensitivity 80.6%). NLP classifiers (logistic regression, support vector machine, and random forest models) performed comparably. The final, combined model outperformed all others (AUC = 0.951, P < 0.001 for comparison to structured model).
DISCUSSION
Embedding text-derived insights within algorithms trained on structured medical data significantly enhances dementia identification capacity.
Highlights
Algorithmic tools for detection of individuals with dementia are available; however, previous work has used heterogeneous case definitions which are not clinically meaningful, and has relied on proxies such as diagnostic codes or medications for case ascertainment.
We used a novel, dual-stream algorithmic development approach, simultaneously and separately modeling a clinically meaningful outcome (diagnosis of dementia according to specialized clinical impression) using structured and unstructured electronic health record datasets.
Our clinically grounded case definition supported the inclusion of key structured variables (such as dementia International Classification of Disease codes and medications) as modeling predictors rather than outcomes.
Our algorithms, published in detail to support validation and replication, represent a major step forward in the use of routinely collected data for detection of diagnosed dementia.
期刊介绍:
Alzheimer's & Dementia is a peer-reviewed journal that aims to bridge knowledge gaps in dementia research by covering the entire spectrum, from basic science to clinical trials to social and behavioral investigations. It provides a platform for rapid communication of new findings and ideas, optimal translation of research into practical applications, increasing knowledge across diverse disciplines for early detection, diagnosis, and intervention, and identifying promising new research directions. In July 2008, Alzheimer's & Dementia was accepted for indexing by MEDLINE, recognizing its scientific merit and contribution to Alzheimer's research.