利用多站点电子健康数据确定亚型特征:N3C 临床租户痴呆症试点研究。

IF 2.5 Q2 HEALTH CARE SCIENCES & SERVICES
JAMIA Open Pub Date : 2024-08-06 eCollection Date: 2024-10-01 DOI:10.1093/jamiaopen/ooae076
Suchetha Sharma, Jiebei Liu, Amy Caroline Abramowitz, Carol Reynolds Geary, Karen C Johnston, Carol Manning, John Darrell Van Horn, Andrea Zhou, Alfred J Anzalone, Johanna Loomba, Emily Pfaff, Don Brown
{"title":"利用多站点电子健康数据确定亚型特征:N3C 临床租户痴呆症试点研究。","authors":"Suchetha Sharma, Jiebei Liu, Amy Caroline Abramowitz, Carol Reynolds Geary, Karen C Johnston, Carol Manning, John Darrell Van Horn, Andrea Zhou, Alfred J Anzalone, Johanna Loomba, Emily Pfaff, Don Brown","doi":"10.1093/jamiaopen/ooae076","DOIUrl":null,"url":null,"abstract":"<p><strong>Objectives: </strong>To provide a foundational methodology for differentiating comorbidity patterns in subphenotypes through investigation of a multi-site dementia patient dataset.</p><p><strong>Materials and methods: </strong>Employing the National Clinical Cohort Collaborative Tenant Pilot (N3C Clinical) dataset, our approach integrates machine learning algorithms-logistic regression and eXtreme Gradient Boosting (XGBoost)-with a diagnostic hierarchical model for nuanced classification of dementia subtypes based on comorbidities and gender. The methodology is enhanced by multi-site EHR data, implementing a hybrid sampling strategy combining 65% Synthetic Minority Over-sampling Technique (SMOTE), 35% Random Under-Sampling (RUS), and Tomek Links for class imbalance. The hierarchical model further refines the analysis, allowing for layered understanding of disease patterns.</p><p><strong>Results: </strong>The study identified significant comorbidity patterns associated with diagnosis of Alzheimer's, Vascular, and Lewy Body dementia subtypes. The classification models achieved accuracies up to 69% for Alzheimer's/Vascular dementia and highlighted challenges in distinguishing Dementia with Lewy Bodies. The hierarchical model elucidates the complexity of diagnosing Dementia with Lewy Bodies and reveals the potential impact of regional clinical practices on dementia classification.</p><p><strong>Conclusion: </strong>Our methodology underscores the importance of leveraging multi-site datasets and tailored sampling techniques for dementia research. This framework holds promise for extending to other disease subtypes, offering a pathway to more nuanced and generalizable insights into dementia and its complex interplay with comorbid conditions.</p><p><strong>Discussion: </strong>This study underscores the critical role of multi-site data analyzes in understanding the relationship between comorbidities and disease subtypes. By utilizing diverse healthcare data, we emphasize the need to consider site-specific differences in clinical practices and patient demographics. Despite challenges like class imbalance and variability in EHR data, our findings highlight the essential contribution of multi-site data to developing accurate and generalizable models for disease classification.</p>","PeriodicalId":36278,"journal":{"name":"JAMIA Open","volume":"7 3","pages":"ooae076"},"PeriodicalIF":2.5000,"publicationDate":"2024-08-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11316614/pdf/","citationCount":"0","resultStr":"{\"title\":\"Leveraging multi-site electronic health data for characterization of subtypes: a pilot study of dementia in the N3C Clinical Tenant.\",\"authors\":\"Suchetha Sharma, Jiebei Liu, Amy Caroline Abramowitz, Carol Reynolds Geary, Karen C Johnston, Carol Manning, John Darrell Van Horn, Andrea Zhou, Alfred J Anzalone, Johanna Loomba, Emily Pfaff, Don Brown\",\"doi\":\"10.1093/jamiaopen/ooae076\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p><strong>Objectives: </strong>To provide a foundational methodology for differentiating comorbidity patterns in subphenotypes through investigation of a multi-site dementia patient dataset.</p><p><strong>Materials and methods: </strong>Employing the National Clinical Cohort Collaborative Tenant Pilot (N3C Clinical) dataset, our approach integrates machine learning algorithms-logistic regression and eXtreme Gradient Boosting (XGBoost)-with a diagnostic hierarchical model for nuanced classification of dementia subtypes based on comorbidities and gender. The methodology is enhanced by multi-site EHR data, implementing a hybrid sampling strategy combining 65% Synthetic Minority Over-sampling Technique (SMOTE), 35% Random Under-Sampling (RUS), and Tomek Links for class imbalance. The hierarchical model further refines the analysis, allowing for layered understanding of disease patterns.</p><p><strong>Results: </strong>The study identified significant comorbidity patterns associated with diagnosis of Alzheimer's, Vascular, and Lewy Body dementia subtypes. The classification models achieved accuracies up to 69% for Alzheimer's/Vascular dementia and highlighted challenges in distinguishing Dementia with Lewy Bodies. The hierarchical model elucidates the complexity of diagnosing Dementia with Lewy Bodies and reveals the potential impact of regional clinical practices on dementia classification.</p><p><strong>Conclusion: </strong>Our methodology underscores the importance of leveraging multi-site datasets and tailored sampling techniques for dementia research. This framework holds promise for extending to other disease subtypes, offering a pathway to more nuanced and generalizable insights into dementia and its complex interplay with comorbid conditions.</p><p><strong>Discussion: </strong>This study underscores the critical role of multi-site data analyzes in understanding the relationship between comorbidities and disease subtypes. By utilizing diverse healthcare data, we emphasize the need to consider site-specific differences in clinical practices and patient demographics. Despite challenges like class imbalance and variability in EHR data, our findings highlight the essential contribution of multi-site data to developing accurate and generalizable models for disease classification.</p>\",\"PeriodicalId\":36278,\"journal\":{\"name\":\"JAMIA Open\",\"volume\":\"7 3\",\"pages\":\"ooae076\"},\"PeriodicalIF\":2.5000,\"publicationDate\":\"2024-08-06\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11316614/pdf/\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"JAMIA Open\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1093/jamiaopen/ooae076\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"2024/10/1 0:00:00\",\"PubModel\":\"eCollection\",\"JCR\":\"Q2\",\"JCRName\":\"HEALTH CARE SCIENCES & SERVICES\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"JAMIA Open","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1093/jamiaopen/ooae076","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2024/10/1 0:00:00","PubModel":"eCollection","JCR":"Q2","JCRName":"HEALTH CARE SCIENCES & SERVICES","Score":null,"Total":0}
引用次数: 0

摘要

目的: 通过对多地点痴呆症患者数据集的调查,提供区分亚型合并症模式的基础方法:通过研究多站点痴呆症患者数据集,为区分亚表型的合并症模式提供基础方法:利用全国临床队列协作租户试点(N3C Clinical)数据集,我们的方法将机器学习算法--逻辑回归和梯度提升(XGBoost)--与诊断分层模型相结合,根据合并症和性别对痴呆亚型进行细致分类。该方法通过多站点电子病历数据得到了增强,采用了混合采样策略,结合了 65% 合成少数群体过度采样技术 (SMOTE)、35% 随机欠采样 (RUS) 和 Tomek Links 来解决类别不平衡问题。分层模型进一步完善了分析,使人们能够分层理解疾病模式:研究发现了与阿尔茨海默氏症、血管性痴呆和路易体痴呆亚型诊断相关的重要合并症模式。分类模型对阿尔茨海默氏症/血管性痴呆的诊断准确率高达 69%,并强调了在区分路易体痴呆方面所面临的挑战。分层模型阐明了路易体痴呆诊断的复杂性,并揭示了地区临床实践对痴呆分类的潜在影响:我们的方法强调了在痴呆症研究中利用多站点数据集和定制抽样技术的重要性。这一框架有望扩展到其他疾病亚型,为深入了解痴呆症及其与并发症的复杂相互作用提供了一条途径:本研究强调了多站点数据分析在理解合并症与疾病亚型之间关系中的关键作用。通过利用不同的医疗保健数据,我们强调了考虑特定地点临床实践和患者人口统计学差异的必要性。尽管存在类别不平衡和电子病历数据多变性等挑战,但我们的研究结果凸显了多站点数据对开发准确、可推广的疾病分类模型的重要贡献。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Leveraging multi-site electronic health data for characterization of subtypes: a pilot study of dementia in the N3C Clinical Tenant.

Objectives: To provide a foundational methodology for differentiating comorbidity patterns in subphenotypes through investigation of a multi-site dementia patient dataset.

Materials and methods: Employing the National Clinical Cohort Collaborative Tenant Pilot (N3C Clinical) dataset, our approach integrates machine learning algorithms-logistic regression and eXtreme Gradient Boosting (XGBoost)-with a diagnostic hierarchical model for nuanced classification of dementia subtypes based on comorbidities and gender. The methodology is enhanced by multi-site EHR data, implementing a hybrid sampling strategy combining 65% Synthetic Minority Over-sampling Technique (SMOTE), 35% Random Under-Sampling (RUS), and Tomek Links for class imbalance. The hierarchical model further refines the analysis, allowing for layered understanding of disease patterns.

Results: The study identified significant comorbidity patterns associated with diagnosis of Alzheimer's, Vascular, and Lewy Body dementia subtypes. The classification models achieved accuracies up to 69% for Alzheimer's/Vascular dementia and highlighted challenges in distinguishing Dementia with Lewy Bodies. The hierarchical model elucidates the complexity of diagnosing Dementia with Lewy Bodies and reveals the potential impact of regional clinical practices on dementia classification.

Conclusion: Our methodology underscores the importance of leveraging multi-site datasets and tailored sampling techniques for dementia research. This framework holds promise for extending to other disease subtypes, offering a pathway to more nuanced and generalizable insights into dementia and its complex interplay with comorbid conditions.

Discussion: This study underscores the critical role of multi-site data analyzes in understanding the relationship between comorbidities and disease subtypes. By utilizing diverse healthcare data, we emphasize the need to consider site-specific differences in clinical practices and patient demographics. Despite challenges like class imbalance and variability in EHR data, our findings highlight the essential contribution of multi-site data to developing accurate and generalizable models for disease classification.

求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
JAMIA Open
JAMIA Open Medicine-Health Informatics
CiteScore
4.10
自引率
4.80%
发文量
102
审稿时长
16 weeks
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信