Enhancing Alzheimer's disease prediction using random forest: A novel framework combining backward feature elimination and ant colony optimization

IF 3.2 4区 医学 Q2 MEDICINE, RESEARCH & EXPERIMENTAL
Afeez A. Soladoye , Nicholas Aderinto , Bolaji A. Omodunbi , Adebimpe O. Esan , Ibrahim A. Adeyanju , David B. Olawade
{"title":"Enhancing Alzheimer's disease prediction using random forest: A novel framework combining backward feature elimination and ant colony optimization","authors":"Afeez A. Soladoye ,&nbsp;Nicholas Aderinto ,&nbsp;Bolaji A. Omodunbi ,&nbsp;Adebimpe O. Esan ,&nbsp;Ibrahim A. Adeyanju ,&nbsp;David B. Olawade","doi":"10.1016/j.retram.2025.103526","DOIUrl":null,"url":null,"abstract":"<div><h3>Background</h3><div>Alzheimer's disease (AD) represents a significant global health challenge due to its increasing prevalence and the limitations of current diagnostic approaches. Early detection is crucial as pathological changes occur 10-15 years before clinical symptoms manifest, yet current diagnostic methods typically identify the disease at moderate to advanced stages. Machine learning techniques offer promising solutions for early prediction, but face challenges related to feature selection and hyperparameter optimization.</div></div><div><h3>Objective</h3><div>To develop an enhanced predictive model for Alzheimer's disease by integrating advanced feature selection techniques with nature-inspired hyperparameter optimization for Random Forest classifiers while ensuring robust validation and statistical significance testing.</div></div><div><h3>Methods</h3><div>This study employed three feature selection techniques (Whale Optimization Algorithm, Artificial Bee Colony, and Backward Elimination Feature Selection) and two hyperparameter optimization algorithms (Artificial Ant Colony Optimization and Bald Eagle Search) to improve Random Forest model performance. A dataset comprising 2,149 instances with 34 features was preprocessed using MinMax normalization and Synthetic Minority Oversampling Technique (SMOTE) applied only to training data to prevent data leakage. Statistical significance testing using McNemar's test was conducted to compare model performances. Model performance was evaluated using accuracy, precision, recall, F1-score, and AUC with confidence intervals calculated using bootstrap sampling.</div></div><div><h3>Results</h3><div>The combination of Backward Elimination Feature Selection with Artificial Ant Colony Optimization achieved the highest performance (95% accuracy ± 1.2%, 95% precision ± 1.1%, 94% recall ± 1.3%, 95% F1-score ± 1.0%, 98% AUC ± 0.8%), outperforming other methodological combinations and conventional machine learning algorithms with statistically significant improvements (p &lt; 0.001). This approach identified 26 significant features associated with Alzheimer's disease. Additionally, nature-inspired optimization algorithms demonstrated substantial computational efficiency advantages over empirical approaches (18 minutes versus 133 minutes).</div></div><div><h3>Conclusion</h3><div>The integration of advanced feature selection with nature-inspired hyperparameter optimization enhances Alzheimer's disease prediction accuracy while improving computational efficiency. However, external validation on independent datasets and prospective clinical studies are needed to establish real-world utility. This methodological framework offers promising applications for early diagnosis and intervention planning, with potential extensions to other complex medical prediction tasks.</div></div>","PeriodicalId":54260,"journal":{"name":"Current Research in Translational Medicine","volume":"73 4","pages":"Article 103526"},"PeriodicalIF":3.2000,"publicationDate":"2025-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Current Research in Translational Medicine","FirstCategoryId":"3","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S2452318625000352","RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"MEDICINE, RESEARCH & EXPERIMENTAL","Score":null,"Total":0}
引用次数: 0

Abstract

Background

Alzheimer's disease (AD) represents a significant global health challenge due to its increasing prevalence and the limitations of current diagnostic approaches. Early detection is crucial as pathological changes occur 10-15 years before clinical symptoms manifest, yet current diagnostic methods typically identify the disease at moderate to advanced stages. Machine learning techniques offer promising solutions for early prediction, but face challenges related to feature selection and hyperparameter optimization.

Objective

To develop an enhanced predictive model for Alzheimer's disease by integrating advanced feature selection techniques with nature-inspired hyperparameter optimization for Random Forest classifiers while ensuring robust validation and statistical significance testing.

Methods

This study employed three feature selection techniques (Whale Optimization Algorithm, Artificial Bee Colony, and Backward Elimination Feature Selection) and two hyperparameter optimization algorithms (Artificial Ant Colony Optimization and Bald Eagle Search) to improve Random Forest model performance. A dataset comprising 2,149 instances with 34 features was preprocessed using MinMax normalization and Synthetic Minority Oversampling Technique (SMOTE) applied only to training data to prevent data leakage. Statistical significance testing using McNemar's test was conducted to compare model performances. Model performance was evaluated using accuracy, precision, recall, F1-score, and AUC with confidence intervals calculated using bootstrap sampling.

Results

The combination of Backward Elimination Feature Selection with Artificial Ant Colony Optimization achieved the highest performance (95% accuracy ± 1.2%, 95% precision ± 1.1%, 94% recall ± 1.3%, 95% F1-score ± 1.0%, 98% AUC ± 0.8%), outperforming other methodological combinations and conventional machine learning algorithms with statistically significant improvements (p < 0.001). This approach identified 26 significant features associated with Alzheimer's disease. Additionally, nature-inspired optimization algorithms demonstrated substantial computational efficiency advantages over empirical approaches (18 minutes versus 133 minutes).

Conclusion

The integration of advanced feature selection with nature-inspired hyperparameter optimization enhances Alzheimer's disease prediction accuracy while improving computational efficiency. However, external validation on independent datasets and prospective clinical studies are needed to establish real-world utility. This methodological framework offers promising applications for early diagnosis and intervention planning, with potential extensions to other complex medical prediction tasks.
利用随机森林增强阿尔茨海默病预测:一种结合反向特征消除和蚁群优化的新框架
阿尔茨海默病(AD)由于其日益增加的患病率和当前诊断方法的局限性,代表了一个重大的全球健康挑战。早期发现是至关重要的,因为病理变化发生在临床症状出现前10-15年,但目前的诊断方法通常在中度至晚期阶段识别疾病。机器学习技术为早期预测提供了有希望的解决方案,但面临着与特征选择和超参数优化相关的挑战。目的在保证鲁棒性验证和统计显著性检验的前提下,将先进的特征选择技术与随机森林分类器的自然启发超参数优化相结合,建立增强的阿尔茨海默病预测模型。方法采用鲸类优化算法、人工蜂群算法和反向消除特征选择三种特征选择技术和人工蚁群优化算法和秃鹰搜索两种超参数优化算法来提高随机森林模型的性能。使用MinMax归一化和合成少数派过采样技术(SMOTE)对包含2149个实例和34个特征的数据集进行预处理,该技术仅应用于训练数据,以防止数据泄漏。采用McNemar检验进行统计显著性检验,比较模型性能。使用准确度、精密度、召回率、f1分数和AUC对模型性能进行评估,并使用自举抽样计算置信区间。结果后向消除特征选择与人工蚁群优化相结合获得了最高的性能(95%正确率±1.2%,95%精密度±1.1%,94%召回率±1.3%,95% f1评分±1.0%,98% AUC±0.8%),优于其他方法组合和传统机器学习算法,具有统计学意义(p <;0.001)。该方法确定了与阿尔茨海默病相关的26个重要特征。此外,与经验方法相比,受自然启发的优化算法显示出显著的计算效率优势(18分钟vs 133分钟)。结论将高级特征选择与自然启发的超参数优化相结合,提高了阿尔茨海默病预测的准确性,同时提高了计算效率。然而,需要对独立数据集和前瞻性临床研究进行外部验证才能建立实际效用。这种方法框架为早期诊断和干预计划提供了有希望的应用,并有可能扩展到其他复杂的医学预测任务。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
Current Research in Translational Medicine
Current Research in Translational Medicine Biochemistry, Genetics and Molecular Biology-General Biochemistry,Genetics and Molecular Biology
CiteScore
7.00
自引率
4.90%
发文量
51
审稿时长
45 days
期刊介绍: Current Research in Translational Medicine is a peer-reviewed journal, publishing worldwide clinical and basic research in the field of hematology, immunology, infectiology, hematopoietic cell transplantation, and cellular and gene therapy. The journal considers for publication English-language editorials, original articles, reviews, and short reports including case-reports. Contributions are intended to draw attention to experimental medicine and translational research. Current Research in Translational Medicine periodically publishes thematic issues and is indexed in all major international databases (2017 Impact Factor is 1.9). Core areas covered in Current Research in Translational Medicine are: Hematology, Immunology, Infectiology, Hematopoietic, Cell Transplantation, Cellular and Gene Therapy.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信