Eram Mahamud , Md Assaduzzaman , Jahirul Islam , Nafiz Fahad , Md Jakir Hossen , Thirumalaimuthu Thirumalaiappan Ramanathan
{"title":"Enhancing Alzheimer's disease detection: An explainable machine learning approach with ensemble techniques","authors":"Eram Mahamud , Md Assaduzzaman , Jahirul Islam , Nafiz Fahad , Md Jakir Hossen , Thirumalaimuthu Thirumalaiappan Ramanathan","doi":"10.1016/j.ibmed.2025.100240","DOIUrl":null,"url":null,"abstract":"<div><div>Alzheimer's Disease (AD) is a progressive neurodegenerative disorder that necessitates early and accurate diagnosis for effective intervention. This study presents a novel machine learning (ML)-driven predictive framework for AD diagnosis, integrating Explainable Artificial Intelligence (XAI) methodologies to enhance interpretability. The dataset, sourced from Kaggle, comprises 2149 patient records with 34 distinct attributes, representing a comprehensive range of demographic, clinical, and lifestyle-related factors. To improve model robustness, rigorous data preprocessing techniques were employed, including mean/mode imputation for missing values, feature scaling using min-max normalization, and class balancing via SMOTE, SMOTEENN, and ADASYN. Feature selection technique was performed using Chi-Square and Recursive Feature Elimination (RFE) to retain the most relevant predictors. Various ML models—including Naïve Bayes, Decision Tree, Random Forest, Logistic Regression, AdaBoost, XGBoost, K-Nearest Neighbors (KNN), and Gradient Boosting—were assessed using accuracy, precision, recall, F1-score, and AUC (Area Under the Curve). The proposed ensemble model, combining LightGBM (LGBM) and Random Forest (RF) with Chi-Square feature selection and utilizing soft voting, achieved the highest test accuracy of 96.35 %, surpassing existing models. Additionally, SHAP (SHapley Additive exPlanations) and LIME (Local Interpretable Model-agnostic Explanations) were utilized to interpret the model's decision-making process, identifying key risk factors and improving transparency for clinical applications. These findings highlight the potential of ML and XAI in advancing AD diagnosis, with future work aiming to validate the model on larger, more diverse datasets and integrate it into real-world clinical workflows.</div></div>","PeriodicalId":73399,"journal":{"name":"Intelligence-based medicine","volume":"11 ","pages":"Article 100240"},"PeriodicalIF":0.0000,"publicationDate":"2025-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Intelligence-based medicine","FirstCategoryId":"1085","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S2666521225000444","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Alzheimer's Disease (AD) is a progressive neurodegenerative disorder that necessitates early and accurate diagnosis for effective intervention. This study presents a novel machine learning (ML)-driven predictive framework for AD diagnosis, integrating Explainable Artificial Intelligence (XAI) methodologies to enhance interpretability. The dataset, sourced from Kaggle, comprises 2149 patient records with 34 distinct attributes, representing a comprehensive range of demographic, clinical, and lifestyle-related factors. To improve model robustness, rigorous data preprocessing techniques were employed, including mean/mode imputation for missing values, feature scaling using min-max normalization, and class balancing via SMOTE, SMOTEENN, and ADASYN. Feature selection technique was performed using Chi-Square and Recursive Feature Elimination (RFE) to retain the most relevant predictors. Various ML models—including Naïve Bayes, Decision Tree, Random Forest, Logistic Regression, AdaBoost, XGBoost, K-Nearest Neighbors (KNN), and Gradient Boosting—were assessed using accuracy, precision, recall, F1-score, and AUC (Area Under the Curve). The proposed ensemble model, combining LightGBM (LGBM) and Random Forest (RF) with Chi-Square feature selection and utilizing soft voting, achieved the highest test accuracy of 96.35 %, surpassing existing models. Additionally, SHAP (SHapley Additive exPlanations) and LIME (Local Interpretable Model-agnostic Explanations) were utilized to interpret the model's decision-making process, identifying key risk factors and improving transparency for clinical applications. These findings highlight the potential of ML and XAI in advancing AD diagnosis, with future work aiming to validate the model on larger, more diverse datasets and integrate it into real-world clinical workflows.