A Multivariable Prediction Model for Mild Cognitive Impairment and Dementia: Algorithm Development and Validation.

IF 3.1 3区 医学 Q2 MEDICAL INFORMATICS
Sarah Soyeon Oh, Bada Kang, Dahye Hong, Jennifer Ivy Kim, Hyewon Jeong, Jinyeop Song, Minkyu Jeon
{"title":"A Multivariable Prediction Model for Mild Cognitive Impairment and Dementia: Algorithm Development and Validation.","authors":"Sarah Soyeon Oh, Bada Kang, Dahye Hong, Jennifer Ivy Kim, Hyewon Jeong, Jinyeop Song, Minkyu Jeon","doi":"10.2196/59396","DOIUrl":null,"url":null,"abstract":"<p><strong>Background: </strong>Mild cognitive impairment (MCI) poses significant challenges in early diagnosis and timely intervention. Underdiagnosis, coupled with the economic and social burden of dementia, necessitates more precise detection methods. Machine learning (ML) algorithms show promise in managing complex data for MCI and dementia prediction.</p><p><strong>Objective: </strong>This study assessed the predictive accuracy of ML models in identifying the onset of MCI and dementia using the Korean Longitudinal Study of Aging (KLoSA) dataset.</p><p><strong>Methods: </strong>This study used data from the KLoSA, a comprehensive biennial survey that tracks the demographic, health, and socioeconomic aspects of middle-aged and older Korean adults from 2018 to 2020. Among the 6171 initial households, 4975 eligible older adult participants aged 60 years or older were selected after excluding individuals based on age and missing data. The identification of MCI and dementia relied on self-reported diagnoses, with sociodemographic and health-related variables serving as key covariates. The dataset was categorized into training and test sets to predict MCI and dementia by using multiple models, including logistic regression, light gradient-boosting machine, XGBoost (extreme gradient boosting), CatBoost, random forest, gradient boosting, AdaBoost, support vector classifier, and k-nearest neighbors, and the training and test sets were used to evaluate predictive performance. The performance was assessed using the area under the receiver operating characteristic curve (AUC). Class imbalances were addressed via weights. Shapley additive explanation values were used to determine the contribution of each feature to the prediction rate.</p><p><strong>Results: </strong>Among the 4975 participants, the best model for predicting MCI onset was random forest, with a median AUC of 0.6729 (IQR 0.3883-0.8152), followed by k-nearest neighbors with a median AUC of 0.5576 (IQR 0.4555-0.6761) and support vector classifier with a median AUC of 0.5067 (IQR 0.3755-0.6389). For dementia onset prediction, the best model was XGBoost, achieving a median AUC of 0.8185 (IQR 0.8085-0.8285), closely followed by light gradient-boosting machine with a median AUC of 0.8069 (IQR 0.7969-0.8169) and AdaBoost with a median AUC of 0.8007 (IQR 0.7907-0.8107). The Shapley values highlighted pain in everyday life, being widowed, living alone, exercising, and living with a partner as the strongest predictors of MCI. For dementia, the most predictive features were other contributing factors, education at the high school level, education at the middle school level, exercising, and monthly social engagement.</p><p><strong>Conclusions: </strong>ML algorithms, especially XGBoost, exhibited the potential for predicting MCI onset using KLoSA data. However, no model has demonstrated robust accuracy in predicting MCI and dementia. Sociodemographic and health-related factors are crucial for initiating cognitive conditions, emphasizing the need for multifaceted predictive models for early identification and intervention. These findings underscore the potential and limitations of ML in predicting cognitive impairment in community-dwelling older adults.</p>","PeriodicalId":56334,"journal":{"name":"JMIR Medical Informatics","volume":"12 ","pages":"e59396"},"PeriodicalIF":3.1000,"publicationDate":"2024-11-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11624448/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"JMIR Medical Informatics","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.2196/59396","RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"MEDICAL INFORMATICS","Score":null,"Total":0}
引用次数: 0

Abstract

Background: Mild cognitive impairment (MCI) poses significant challenges in early diagnosis and timely intervention. Underdiagnosis, coupled with the economic and social burden of dementia, necessitates more precise detection methods. Machine learning (ML) algorithms show promise in managing complex data for MCI and dementia prediction.

Objective: This study assessed the predictive accuracy of ML models in identifying the onset of MCI and dementia using the Korean Longitudinal Study of Aging (KLoSA) dataset.

Methods: This study used data from the KLoSA, a comprehensive biennial survey that tracks the demographic, health, and socioeconomic aspects of middle-aged and older Korean adults from 2018 to 2020. Among the 6171 initial households, 4975 eligible older adult participants aged 60 years or older were selected after excluding individuals based on age and missing data. The identification of MCI and dementia relied on self-reported diagnoses, with sociodemographic and health-related variables serving as key covariates. The dataset was categorized into training and test sets to predict MCI and dementia by using multiple models, including logistic regression, light gradient-boosting machine, XGBoost (extreme gradient boosting), CatBoost, random forest, gradient boosting, AdaBoost, support vector classifier, and k-nearest neighbors, and the training and test sets were used to evaluate predictive performance. The performance was assessed using the area under the receiver operating characteristic curve (AUC). Class imbalances were addressed via weights. Shapley additive explanation values were used to determine the contribution of each feature to the prediction rate.

Results: Among the 4975 participants, the best model for predicting MCI onset was random forest, with a median AUC of 0.6729 (IQR 0.3883-0.8152), followed by k-nearest neighbors with a median AUC of 0.5576 (IQR 0.4555-0.6761) and support vector classifier with a median AUC of 0.5067 (IQR 0.3755-0.6389). For dementia onset prediction, the best model was XGBoost, achieving a median AUC of 0.8185 (IQR 0.8085-0.8285), closely followed by light gradient-boosting machine with a median AUC of 0.8069 (IQR 0.7969-0.8169) and AdaBoost with a median AUC of 0.8007 (IQR 0.7907-0.8107). The Shapley values highlighted pain in everyday life, being widowed, living alone, exercising, and living with a partner as the strongest predictors of MCI. For dementia, the most predictive features were other contributing factors, education at the high school level, education at the middle school level, exercising, and monthly social engagement.

Conclusions: ML algorithms, especially XGBoost, exhibited the potential for predicting MCI onset using KLoSA data. However, no model has demonstrated robust accuracy in predicting MCI and dementia. Sociodemographic and health-related factors are crucial for initiating cognitive conditions, emphasizing the need for multifaceted predictive models for early identification and intervention. These findings underscore the potential and limitations of ML in predicting cognitive impairment in community-dwelling older adults.

轻度认知障碍和痴呆症的多变量预测模型:算法开发与验证
背景:轻度认知障碍(MCI)给早期诊断和及时干预带来了巨大挑战。诊断不足加上痴呆症带来的经济和社会负担,需要更精确的检测方法。机器学习(ML)算法在管理 MCI 和痴呆症预测的复杂数据方面大有可为:本研究使用韩国老龄化纵向研究(KLoSA)数据集评估了 ML 模型在识别 MCI 和痴呆症发病方面的预测准确性:这项研究使用了韩国老龄化纵向研究(KLoSA)的数据,这是一项两年一次的综合性调查,从 2018 年到 2020 年对韩国中老年人的人口、健康和社会经济方面进行跟踪调查。在 6171 个初始家庭中,根据年龄和数据缺失情况排除个体后,选出了 4975 名符合条件的 60 岁或以上老年人参与者。MCI 和痴呆症的识别依赖于自我报告的诊断,社会人口学和健康相关变量是关键的协变量。数据集被分为训练集和测试集,使用多种模型预测 MCI 和痴呆症,包括逻辑回归、轻梯度提升机、XGBoost(极端梯度提升)、CatBoost、随机森林、梯度提升、AdaBoost、支持向量分类器和 k-nearest neighbors,并使用训练集和测试集评估预测性能。使用接收者工作特征曲线下的面积(AUC)来评估性能。类的不平衡通过权重来解决。沙普利加法解释值用于确定每个特征对预测率的贡献:在4975名参与者中,预测MCI发病的最佳模型是随机森林,其AUC中值为0.6729(IQR为0.3883-0.8152),其次是k-近邻分类器,其AUC中值为0.5576(IQR为0.4555-0.6761),再次是支持向量分类器,其AUC中值为0.5067(IQR为0.3755-0.6389)。在痴呆症发病预测方面,最佳模型是 XGBoost,其 AUC 中位数为 0.8185(IQR 0.8085-0.8285),紧随其后的是轻梯度增强机,其 AUC 中位数为 0.8069(IQR 0.7969-0.8169),以及 AdaBoost,其 AUC 中位数为 0.8007(IQR 0.7907-0.8107)。Shapley 值显示,日常生活中的疼痛、丧偶、独居、锻炼和与伴侣同住是 MCI 的最强预测因素。对于痴呆症而言,其他诱因、高中教育程度、初中教育程度、锻炼和每月社交活动是最具预测性的特征:ML 算法,尤其是 XGBoost,显示出利用 KLoSA 数据预测 MCI 发病的潜力。然而,还没有任何模型在预测 MCI 和痴呆症方面表现出强大的准确性。社会人口学和健康相关因素对认知症的发病至关重要,因此需要多方面的预测模型来进行早期识别和干预。这些发现强调了 ML 在预测社区老年人认知障碍方面的潜力和局限性。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
JMIR Medical Informatics
JMIR Medical Informatics Medicine-Health Informatics
CiteScore
7.90
自引率
3.10%
发文量
173
审稿时长
12 weeks
期刊介绍: JMIR Medical Informatics (JMI, ISSN 2291-9694) is a top-rated, tier A journal which focuses on clinical informatics, big data in health and health care, decision support for health professionals, electronic health records, ehealth infrastructures and implementation. It has a focus on applied, translational research, with a broad readership including clinicians, CIOs, engineers, industry and health informatics professionals. Published by JMIR Publications, publisher of the Journal of Medical Internet Research (JMIR), the leading eHealth/mHealth journal (Impact Factor 2016: 5.175), JMIR Med Inform has a slightly different scope (emphasizing more on applications for clinicians and health professionals rather than consumers/citizens, which is the focus of JMIR), publishes even faster, and also allows papers which are more technical or more formative than what would be published in the Journal of Medical Internet Research.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信