Unsupervised Deep Learning of Electronic Health Records to Characterize Heterogeneity Across Alzheimer Disease and Related Dementias: Cross-Sectional Study.

IF 5 Q1 GERIATRICS & GERONTOLOGY
JMIR Aging Pub Date : 2025-03-31 DOI:10.2196/65178
Matthew West, You Cheng, Yingnan He, Yu Leng, Colin Magdamo, Bradley T Hyman, John R Dickson, Alberto Serrano-Pozo, Deborah Blacker, Sudeshna Das
{"title":"Unsupervised Deep Learning of Electronic Health Records to Characterize Heterogeneity Across Alzheimer Disease and Related Dementias: Cross-Sectional Study.","authors":"Matthew West, You Cheng, Yingnan He, Yu Leng, Colin Magdamo, Bradley T Hyman, John R Dickson, Alberto Serrano-Pozo, Deborah Blacker, Sudeshna Das","doi":"10.2196/65178","DOIUrl":null,"url":null,"abstract":"<p><strong>Background: </strong>Alzheimer disease and related dementias (ADRD) exhibit prominent heterogeneity. Identifying clinically meaningful ADRD subtypes is essential for tailoring treatments to specific patient phenotypes.</p><p><strong>Objective: </strong>We aimed to use unsupervised learning techniques on electronic health records (EHRs) from memory clinic patients to identify ADRD subtypes.</p><p><strong>Methods: </strong>We used pretrained embeddings of non-ADRD diagnosis codes (International Classification of Diseases, Ninth Revision) and large language model (LLM)-derived embeddings of clinical notes from patient EHRs. Hierarchical clustering of these embeddings was used to identify ADRD subtypes. Clusters were characterized regarding their demographic and clinical features.</p><p><strong>Results: </strong>We analyzed a cohort of 3454 patients with ADRD from a memory clinic at Massachusetts General Hospital, each with a specialist diagnosis. Clustering pretrained embeddings of the non-ADRD diagnosis codes in patient EHRs revealed the following 3 patient subtypes: one with skin conditions, another with psychiatric disorders and an earlier age of onset, and a third with diabetes complications. Similarly, using LLM-derived embeddings of clinical notes, we identified 3 subtypes of patients as follows: one with psychiatric manifestations and higher prevalence of female participants (prevalence ratio: 1.59), another with cardiovascular and motor problems and higher prevalence of male participants (prevalence ratio: 1.75), and a third one with geriatric health disorders. Notably, we observed significant overlap between clusters from both data modalities (χ<sup>2</sup><sub>4</sub>=89.4; P<.001).</p><p><strong>Conclusions: </strong>By integrating International Classification of Diseases, Ninth Revision codes and LLM-derived embeddings, our analysis delineated 2 distinct ADRD subtypes with sex-specific comorbid and clinical presentations, offering insights for potential precision medicine approaches.</p>","PeriodicalId":36245,"journal":{"name":"JMIR Aging","volume":"8 ","pages":"e65178"},"PeriodicalIF":5.0000,"publicationDate":"2025-03-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"JMIR Aging","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.2196/65178","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"GERIATRICS & GERONTOLOGY","Score":null,"Total":0}
引用次数: 0

Abstract

Background: Alzheimer disease and related dementias (ADRD) exhibit prominent heterogeneity. Identifying clinically meaningful ADRD subtypes is essential for tailoring treatments to specific patient phenotypes.

Objective: We aimed to use unsupervised learning techniques on electronic health records (EHRs) from memory clinic patients to identify ADRD subtypes.

Methods: We used pretrained embeddings of non-ADRD diagnosis codes (International Classification of Diseases, Ninth Revision) and large language model (LLM)-derived embeddings of clinical notes from patient EHRs. Hierarchical clustering of these embeddings was used to identify ADRD subtypes. Clusters were characterized regarding their demographic and clinical features.

Results: We analyzed a cohort of 3454 patients with ADRD from a memory clinic at Massachusetts General Hospital, each with a specialist diagnosis. Clustering pretrained embeddings of the non-ADRD diagnosis codes in patient EHRs revealed the following 3 patient subtypes: one with skin conditions, another with psychiatric disorders and an earlier age of onset, and a third with diabetes complications. Similarly, using LLM-derived embeddings of clinical notes, we identified 3 subtypes of patients as follows: one with psychiatric manifestations and higher prevalence of female participants (prevalence ratio: 1.59), another with cardiovascular and motor problems and higher prevalence of male participants (prevalence ratio: 1.75), and a third one with geriatric health disorders. Notably, we observed significant overlap between clusters from both data modalities (χ24=89.4; P<.001).

Conclusions: By integrating International Classification of Diseases, Ninth Revision codes and LLM-derived embeddings, our analysis delineated 2 distinct ADRD subtypes with sex-specific comorbid and clinical presentations, offering insights for potential precision medicine approaches.

电子健康记录的无监督深度学习表征阿尔茨海默病和相关痴呆的异质性:横断面研究
背景:阿尔茨海默病和相关痴呆(ADRD)表现出显著的异质性。确定临床上有意义的ADRD亚型对于针对特定患者表型定制治疗至关重要。目的:利用无监督学习技术对记忆门诊患者的电子健康记录(EHRs)进行识别ADRD亚型。方法:我们使用预训练的非adrd诊断代码(国际疾病分类,第九版)嵌入和来自患者电子病历的临床记录的大语言模型(LLM)衍生嵌入。利用这些嵌入的分层聚类来识别ADRD亚型。根据其人口学和临床特征对集群进行特征分析。结果:我们分析了3454名来自马萨诸塞州总医院记忆诊所的ADRD患者,每位患者都有专科诊断。患者电子病历中非adrd诊断代码的聚类预训练嵌入揭示了以下3种患者亚型:一种患有皮肤病,另一种患有精神疾病且发病年龄较早,第三种患有糖尿病并发症。同样,使用llm衍生的临床记录嵌入,我们确定了以下3种患者亚型:一种有精神症状,女性参与者的患病率较高(患病率比:1.59),另一种有心血管和运动问题,男性参与者的患病率较高(患病率比:1.75),第三种有老年健康障碍。值得注意的是,我们观察到两种数据模式的聚类之间存在显著的重叠(χ24=89.4;结论:通过整合国际疾病分类、第九次修订代码和llm衍生的嵌入,我们的分析描绘了2种具有性别特异性合并症和临床表现的截然不同的ADRD亚型,为潜在的精准医学方法提供了见解。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
JMIR Aging
JMIR Aging Social Sciences-Health (social science)
CiteScore
6.50
自引率
4.10%
发文量
71
审稿时长
12 weeks
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信