Development and validation of a machine learning model for early screening of high-risk mild cognitive impairment from the multi-cohort data

IF 4.1 2区医学 Q2 COMPUTER SCIENCE, INFORMATION SYSTEMS

International Journal of Medical Informatics Pub Date : 2025-07-02 DOI:10.1016/j.ijmedinf.2025.106030

Xuan Wu , Xuecheng Yao , Jianing Shi , Mengling Tang , Qingli Zhou , Kun Chen

{"title":"Development and validation of a machine learning model for early screening of high-risk mild cognitive impairment from the multi-cohort data","authors":"Xuan Wu , Xuecheng Yao , Jianing Shi , Mengling Tang , Qingli Zhou , Kun Chen","doi":"10.1016/j.ijmedinf.2025.106030","DOIUrl":null,"url":null,"abstract":"<div><h3>Background</h3><div>Early screening of mild cognitive impairment (MCI) in older populations is crucial for timely intervention. MCI often precedes dementia, but current diagnostic tools are time-consuming and not widely accessible. Utilizing basic physical examination data may enable earlier, more practical screening.</div></div><div><h3>Methods</h3><div>Data from the China Health and Retirement Longitudinal Study (CHARLS) 2015 were used to develop the model. Two external datasets from CHARLS 2011 and Yiwu 2021 cohorts were used for validation. A total of 34 variables were considered, including demographics, health conditions, lifestyle, and physical and blood examination data. The Mini-Mental State Examination (MMSE) was used for MCI diagnosis. Seven key variables (education, grip strength, height, weight, creatinine, mean corpuscular volume, and platelet count) were selected through majority voting. Five machine learning models were evaluated, and a Random Forest (RF) model was chosen based on its superior performance.</div></div><div><h3>Results</h3><div>The model demonstrated high diagnostic performance with a sensitivity of 0.906, specificity of 0.850, and accuracy of 85.5%. The area under the receiver operating characteristic curve (AUROC) was 0.93, and the area under the precision-recall curve (AUPRC) was 0.93. In the external validation, AUROCs of 0.83 and 0.87 were achieved. The model was enhanced with an explainable method and deployed via a Streamlit-based web application.</div></div><div><h3>Conclusions</h3><div>This study successfully developed machine learning-based models for early MCI screening in older populations via basic physical examination data and MCI risk prediction through a web calculator (<span><span>https://mciscreening.streamlit.app/</span><svg><path></path></svg></span>), both demonstrating favorable performance, generalizability, and effective clinical implementation.</div></div>","PeriodicalId":54950,"journal":{"name":"International Journal of Medical Informatics","volume":"203 ","pages":"Article 106030"},"PeriodicalIF":4.1000,"publicationDate":"2025-07-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"International Journal of Medical Informatics","FirstCategoryId":"3","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S1386505625002473","RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}

引用次数: 0

Abstract

Background

Early screening of mild cognitive impairment (MCI) in older populations is crucial for timely intervention. MCI often precedes dementia, but current diagnostic tools are time-consuming and not widely accessible. Utilizing basic physical examination data may enable earlier, more practical screening.

Methods

Data from the China Health and Retirement Longitudinal Study (CHARLS) 2015 were used to develop the model. Two external datasets from CHARLS 2011 and Yiwu 2021 cohorts were used for validation. A total of 34 variables were considered, including demographics, health conditions, lifestyle, and physical and blood examination data. The Mini-Mental State Examination (MMSE) was used for MCI diagnosis. Seven key variables (education, grip strength, height, weight, creatinine, mean corpuscular volume, and platelet count) were selected through majority voting. Five machine learning models were evaluated, and a Random Forest (RF) model was chosen based on its superior performance.

Results

The model demonstrated high diagnostic performance with a sensitivity of 0.906, specificity of 0.850, and accuracy of 85.5%. The area under the receiver operating characteristic curve (AUROC) was 0.93, and the area under the precision-recall curve (AUPRC) was 0.93. In the external validation, AUROCs of 0.83 and 0.87 were achieved. The model was enhanced with an explainable method and deployed via a Streamlit-based web application.

Conclusions

This study successfully developed machine learning-based models for early MCI screening in older populations via basic physical examination data and MCI risk prediction through a web calculator (https://mciscreening.streamlit.app/), both demonstrating favorable performance, generalizability, and effective clinical implementation.

Abstract Image

查看原文本刊更多论文

从多队列数据中开发和验证用于早期筛查高风险轻度认知障碍的机器学习模型

背景：老年人轻度认知障碍（MCI）的筛查对于及时干预至关重要。轻度认知障碍通常发生在痴呆症之前，但目前的诊断工具既耗时又不易普及。利用基本的体检数据可以实现更早、更实用的筛查。方法采用2015年中国健康与退休纵向研究（CHARLS）的数据建立模型。使用CHARLS 2011和义乌2021队列的两个外部数据集进行验证。总共考虑了34个变量，包括人口统计、健康状况、生活方式、身体和血液检查数据。MCI诊断采用简易精神状态检查（MMSE）。七个关键变量（教育程度、握力、身高、体重、肌酐、平均红细胞体积、血小板计数）通过多数投票选出。对五种机器学习模型进行了评估，并根据其优越的性能选择了随机森林（Random Forest， RF）模型。结果该模型的诊断灵敏度为0.906，特异度为0.850，准确率为85.5%。受试者工作特征曲线（AUROC）下面积为0.93，精密度召回曲线（AUPRC）下面积为0.93。外部验证的auroc分别为0.83和0.87。该模型通过一种可解释的方法得到增强，并通过基于streamlite的web应用程序进行部署。本研究成功开发了基于机器学习的模型，通过基本体检数据对老年人群进行早期MCI筛查，并通过网络计算器（https://mciscreening.streamlit.app/）预测MCI风险，两者均表现出良好的性能、通用性和有效的临床实施。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

International Journal of Medical Informatics 医学-计算机：信息系统

CiteScore

8.90

自引率

4.10%

发文量

217

审稿时长

42 days

期刊介绍： International Journal of Medical Informatics provides an international medium for dissemination of original results and interpretative reviews concerning the field of medical informatics. The Journal emphasizes the evaluation of systems in healthcare settings. The scope of journal covers: Information systems, including national or international registration systems, hospital information systems, departmental and/or physician''s office systems, document handling systems, electronic medical record systems, standardization, systems integration etc.; Computer-aided medical decision support systems using heuristic, algorithmic and/or statistical methods as exemplified in decision theory, protocol development, artificial intelligence, etc. Educational computer based programs pertaining to medical informatics or medicine in general; Organizational, economic, social, clinical impact, ethical and cost-benefit aspects of IT applications in health care.