Enhancing Alzheimer's disease detection: An explainable machine learning approach with ensemble techniques

Intelligence-based medicine Pub Date : 2025-01-01 DOI:10.1016/j.ibmed.2025.100240

Eram Mahamud , Md Assaduzzaman , Jahirul Islam , Nafiz Fahad , Md Jakir Hossen , Thirumalaimuthu Thirumalaiappan Ramanathan

{"title":"Enhancing Alzheimer's disease detection: An explainable machine learning approach with ensemble techniques","authors":"Eram Mahamud , Md Assaduzzaman , Jahirul Islam , Nafiz Fahad , Md Jakir Hossen , Thirumalaimuthu Thirumalaiappan Ramanathan","doi":"10.1016/j.ibmed.2025.100240","DOIUrl":null,"url":null,"abstract":"<div><div>Alzheimer's Disease (AD) is a progressive neurodegenerative disorder that necessitates early and accurate diagnosis for effective intervention. This study presents a novel machine learning (ML)-driven predictive framework for AD diagnosis, integrating Explainable Artificial Intelligence (XAI) methodologies to enhance interpretability. The dataset, sourced from Kaggle, comprises 2149 patient records with 34 distinct attributes, representing a comprehensive range of demographic, clinical, and lifestyle-related factors. To improve model robustness, rigorous data preprocessing techniques were employed, including mean/mode imputation for missing values, feature scaling using min-max normalization, and class balancing via SMOTE, SMOTEENN, and ADASYN. Feature selection technique was performed using Chi-Square and Recursive Feature Elimination (RFE) to retain the most relevant predictors. Various ML models—including Naïve Bayes, Decision Tree, Random Forest, Logistic Regression, AdaBoost, XGBoost, K-Nearest Neighbors (KNN), and Gradient Boosting—were assessed using accuracy, precision, recall, F1-score, and AUC (Area Under the Curve). The proposed ensemble model, combining LightGBM (LGBM) and Random Forest (RF) with Chi-Square feature selection and utilizing soft voting, achieved the highest test accuracy of 96.35 %, surpassing existing models. Additionally, SHAP (SHapley Additive exPlanations) and LIME (Local Interpretable Model-agnostic Explanations) were utilized to interpret the model's decision-making process, identifying key risk factors and improving transparency for clinical applications. These findings highlight the potential of ML and XAI in advancing AD diagnosis, with future work aiming to validate the model on larger, more diverse datasets and integrate it into real-world clinical workflows.</div></div>","PeriodicalId":73399,"journal":{"name":"Intelligence-based medicine","volume":"11 ","pages":"Article 100240"},"PeriodicalIF":0.0000,"publicationDate":"2025-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Intelligence-based medicine","FirstCategoryId":"1085","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S2666521225000444","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

Alzheimer's Disease (AD) is a progressive neurodegenerative disorder that necessitates early and accurate diagnosis for effective intervention. This study presents a novel machine learning (ML)-driven predictive framework for AD diagnosis, integrating Explainable Artificial Intelligence (XAI) methodologies to enhance interpretability. The dataset, sourced from Kaggle, comprises 2149 patient records with 34 distinct attributes, representing a comprehensive range of demographic, clinical, and lifestyle-related factors. To improve model robustness, rigorous data preprocessing techniques were employed, including mean/mode imputation for missing values, feature scaling using min-max normalization, and class balancing via SMOTE, SMOTEENN, and ADASYN. Feature selection technique was performed using Chi-Square and Recursive Feature Elimination (RFE) to retain the most relevant predictors. Various ML models—including Naïve Bayes, Decision Tree, Random Forest, Logistic Regression, AdaBoost, XGBoost, K-Nearest Neighbors (KNN), and Gradient Boosting—were assessed using accuracy, precision, recall, F1-score, and AUC (Area Under the Curve). The proposed ensemble model, combining LightGBM (LGBM) and Random Forest (RF) with Chi-Square feature selection and utilizing soft voting, achieved the highest test accuracy of 96.35 %, surpassing existing models. Additionally, SHAP (SHapley Additive exPlanations) and LIME (Local Interpretable Model-agnostic Explanations) were utilized to interpret the model's decision-making process, identifying key risk factors and improving transparency for clinical applications. These findings highlight the potential of ML and XAI in advancing AD diagnosis, with future work aiming to validate the model on larger, more diverse datasets and integrate it into real-world clinical workflows.

查看原文本刊更多论文

增强阿尔茨海默病检测：一种可解释的集成技术机器学习方法

阿尔茨海默病（AD）是一种进行性神经退行性疾病，需要早期准确诊断才能进行有效干预。本研究提出了一种新的机器学习（ML）驱动的AD诊断预测框架，整合了可解释的人工智能（XAI）方法来提高可解释性。该数据集来自Kaggle，包括2149例患者记录，34个不同的属性，代表了人口统计、临床和生活方式相关因素的综合范围。为了提高模型的鲁棒性，采用了严格的数据预处理技术，包括缺失值的均值/模式输入，使用最小-最大归一化的特征缩放，以及通过SMOTE， SMOTEENN和ADASYN进行类平衡。特征选择技术使用卡方和递归特征消除（RFE）来保留最相关的预测因子。各种ML模型——包括Naïve贝叶斯、决策树、随机森林、逻辑回归、AdaBoost、XGBoost、k -近邻（KNN）和梯度boost——使用准确性、精密度、召回率、f1分数和曲线下面积（AUC）进行评估。该集成模型将LightGBM （LGBM）和Random Forest （RF）结合Chi-Square特征选择，并利用软投票，测试准确率达到96.35%，超过了现有模型。此外，SHAP （SHapley Additive explained）和LIME （Local Interpretable model -agnostic explained）被用于解释模型的决策过程，识别关键风险因素，提高临床应用的透明度。这些发现突出了ML和XAI在推进AD诊断方面的潜力，未来的工作旨在在更大、更多样化的数据集上验证该模型，并将其整合到现实世界的临床工作流程中。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊