Xuan Wu , Xuecheng Yao , Jianing Shi , Mengling Tang , Qingli Zhou , Kun Chen
{"title":"Development and validation of a machine learning model for early screening of high-risk mild cognitive impairment from the multi-cohort data","authors":"Xuan Wu , Xuecheng Yao , Jianing Shi , Mengling Tang , Qingli Zhou , Kun Chen","doi":"10.1016/j.ijmedinf.2025.106030","DOIUrl":null,"url":null,"abstract":"<div><h3>Background</h3><div>Early screening of mild cognitive impairment (MCI) in older populations is crucial for timely intervention. MCI often precedes dementia, but current diagnostic tools are time-consuming and not widely accessible. Utilizing basic physical examination data may enable earlier, more practical screening.</div></div><div><h3>Methods</h3><div>Data from the China Health and Retirement Longitudinal Study (CHARLS) 2015 were used to develop the model. Two external datasets from CHARLS 2011 and Yiwu 2021 cohorts were used for validation. A total of 34 variables were considered, including demographics, health conditions, lifestyle, and physical and blood examination data. The Mini-Mental State Examination (MMSE) was used for MCI diagnosis. Seven key variables (education, grip strength, height, weight, creatinine, mean corpuscular volume, and platelet count) were selected through majority voting. Five machine learning models were evaluated, and a Random Forest (RF) model was chosen based on its superior performance.</div></div><div><h3>Results</h3><div>The model demonstrated high diagnostic performance with a sensitivity of 0.906, specificity of 0.850, and accuracy of 85.5%. The area under the receiver operating characteristic curve (AUROC) was 0.93, and the area under the precision-recall curve (AUPRC) was 0.93. In the external validation, AUROCs of 0.83 and 0.87 were achieved. The model was enhanced with an explainable method and deployed via a Streamlit-based web application.</div></div><div><h3>Conclusions</h3><div>This study successfully developed machine learning-based models for early MCI screening in older populations via basic physical examination data and MCI risk prediction through a web calculator (<span><span>https://mciscreening.streamlit.app/</span><svg><path></path></svg></span>), both demonstrating favorable performance, generalizability, and effective clinical implementation.</div></div>","PeriodicalId":54950,"journal":{"name":"International Journal of Medical Informatics","volume":"203 ","pages":"Article 106030"},"PeriodicalIF":4.1000,"publicationDate":"2025-07-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"International Journal of Medical Informatics","FirstCategoryId":"3","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S1386505625002473","RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}
引用次数: 0
Abstract
Background
Early screening of mild cognitive impairment (MCI) in older populations is crucial for timely intervention. MCI often precedes dementia, but current diagnostic tools are time-consuming and not widely accessible. Utilizing basic physical examination data may enable earlier, more practical screening.
Methods
Data from the China Health and Retirement Longitudinal Study (CHARLS) 2015 were used to develop the model. Two external datasets from CHARLS 2011 and Yiwu 2021 cohorts were used for validation. A total of 34 variables were considered, including demographics, health conditions, lifestyle, and physical and blood examination data. The Mini-Mental State Examination (MMSE) was used for MCI diagnosis. Seven key variables (education, grip strength, height, weight, creatinine, mean corpuscular volume, and platelet count) were selected through majority voting. Five machine learning models were evaluated, and a Random Forest (RF) model was chosen based on its superior performance.
Results
The model demonstrated high diagnostic performance with a sensitivity of 0.906, specificity of 0.850, and accuracy of 85.5%. The area under the receiver operating characteristic curve (AUROC) was 0.93, and the area under the precision-recall curve (AUPRC) was 0.93. In the external validation, AUROCs of 0.83 and 0.87 were achieved. The model was enhanced with an explainable method and deployed via a Streamlit-based web application.
Conclusions
This study successfully developed machine learning-based models for early MCI screening in older populations via basic physical examination data and MCI risk prediction through a web calculator (https://mciscreening.streamlit.app/), both demonstrating favorable performance, generalizability, and effective clinical implementation.
期刊介绍:
International Journal of Medical Informatics provides an international medium for dissemination of original results and interpretative reviews concerning the field of medical informatics. The Journal emphasizes the evaluation of systems in healthcare settings.
The scope of journal covers:
Information systems, including national or international registration systems, hospital information systems, departmental and/or physician''s office systems, document handling systems, electronic medical record systems, standardization, systems integration etc.;
Computer-aided medical decision support systems using heuristic, algorithmic and/or statistical methods as exemplified in decision theory, protocol development, artificial intelligence, etc.
Educational computer based programs pertaining to medical informatics or medicine in general;
Organizational, economic, social, clinical impact, ethical and cost-benefit aspects of IT applications in health care.