Md. Maruf Hossain, Md. Mahfuz Ahmed, Md. Rakibul Hasan Rakib, Mohammad Osama Zia, Rakib Hasan, Md. Rakibul Islam, Md. Shohidul Islam, Md Shahariar Alam, Md. Khairul Islam
{"title":"优化中风风险预测:一个主要数据集驱动的集成分类器与可解释的人工智能","authors":"Md. Maruf Hossain, Md. Mahfuz Ahmed, Md. Rakibul Hasan Rakib, Mohammad Osama Zia, Rakib Hasan, Md. Rakibul Islam, Md. Shohidul Islam, Md Shahariar Alam, Md. Khairul Islam","doi":"10.1002/hsr2.70799","DOIUrl":null,"url":null,"abstract":"<div>\n \n \n <section>\n \n <h3> Background and Aims</h3>\n \n <p>Stroke remains a leading cause of mortality and long-term disability worldwide, presenting a significant global health challenge. Effective early prediction models are essential for reducing its impact. This study introduces a novel ensemble method for predicting stroke using two datasets: a primary dataset collected from a hospital, containing medical histories and clinical parameters, and a secondary dataset.</p>\n </section>\n \n <section>\n \n <h3> Methods</h3>\n \n <p>We applied several preprocessing techniques, including outlier detection, data normalization, k-means clustering, and missing value detection, to refine the datasets. A novel ensemble classifier was developed, combining AdaBoost, Gradient Boosting Machine (GBM), Multilayer Perceptron (MLP), and Random Forest (RF) algorithms to enhance predictive accuracy. Additionally, Explainable Artificial Intelligence (XAI) techniques such as SHAP and LIME were integrated to elucidate key features influencing stroke prediction.</p>\n </section>\n \n <section>\n \n <h3> Results</h3>\n \n <p>The proposed ensemble classifier achieved an accuracy of 95% for the secondary dataset and 80.36% for the primary dataset. Comparative analysis with other machine learning models highlighted the superior performance of the ensemble approach. The integration of XAI further provided insights into the critical indicators influencing stroke classification, improving model interpretability and decision-making.</p>\n </section>\n \n <section>\n \n <h3> Conclusion</h3>\n \n <p>Our study demonstrates that the novel ensemble classifier, supported by effective preprocessing and XAI techniques, is a powerful tool for stroke prediction. The high accuracy rates achieved validate its effectiveness and potential for practical clinical application. Future work will focus on incorporating deep learning techniques and medical imaging to further improve classification accuracy and model performance.</p>\n </section>\n </div>","PeriodicalId":36518,"journal":{"name":"Health Science Reports","volume":"8 5","pages":""},"PeriodicalIF":2.1000,"publicationDate":"2025-05-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1002/hsr2.70799","citationCount":"0","resultStr":"{\"title\":\"Optimizing Stroke Risk Prediction: A Primary Dataset-Driven Ensemble Classifier With Explainable Artificial Intelligence\",\"authors\":\"Md. Maruf Hossain, Md. Mahfuz Ahmed, Md. Rakibul Hasan Rakib, Mohammad Osama Zia, Rakib Hasan, Md. Rakibul Islam, Md. Shohidul Islam, Md Shahariar Alam, Md. Khairul Islam\",\"doi\":\"10.1002/hsr2.70799\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div>\\n \\n \\n <section>\\n \\n <h3> Background and Aims</h3>\\n \\n <p>Stroke remains a leading cause of mortality and long-term disability worldwide, presenting a significant global health challenge. Effective early prediction models are essential for reducing its impact. This study introduces a novel ensemble method for predicting stroke using two datasets: a primary dataset collected from a hospital, containing medical histories and clinical parameters, and a secondary dataset.</p>\\n </section>\\n \\n <section>\\n \\n <h3> Methods</h3>\\n \\n <p>We applied several preprocessing techniques, including outlier detection, data normalization, k-means clustering, and missing value detection, to refine the datasets. A novel ensemble classifier was developed, combining AdaBoost, Gradient Boosting Machine (GBM), Multilayer Perceptron (MLP), and Random Forest (RF) algorithms to enhance predictive accuracy. Additionally, Explainable Artificial Intelligence (XAI) techniques such as SHAP and LIME were integrated to elucidate key features influencing stroke prediction.</p>\\n </section>\\n \\n <section>\\n \\n <h3> Results</h3>\\n \\n <p>The proposed ensemble classifier achieved an accuracy of 95% for the secondary dataset and 80.36% for the primary dataset. Comparative analysis with other machine learning models highlighted the superior performance of the ensemble approach. The integration of XAI further provided insights into the critical indicators influencing stroke classification, improving model interpretability and decision-making.</p>\\n </section>\\n \\n <section>\\n \\n <h3> Conclusion</h3>\\n \\n <p>Our study demonstrates that the novel ensemble classifier, supported by effective preprocessing and XAI techniques, is a powerful tool for stroke prediction. The high accuracy rates achieved validate its effectiveness and potential for practical clinical application. Future work will focus on incorporating deep learning techniques and medical imaging to further improve classification accuracy and model performance.</p>\\n </section>\\n </div>\",\"PeriodicalId\":36518,\"journal\":{\"name\":\"Health Science Reports\",\"volume\":\"8 5\",\"pages\":\"\"},\"PeriodicalIF\":2.1000,\"publicationDate\":\"2025-05-05\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://onlinelibrary.wiley.com/doi/epdf/10.1002/hsr2.70799\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Health Science Reports\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://onlinelibrary.wiley.com/doi/10.1002/hsr2.70799\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"MEDICINE, GENERAL & INTERNAL\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Health Science Reports","FirstCategoryId":"1085","ListUrlMain":"https://onlinelibrary.wiley.com/doi/10.1002/hsr2.70799","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"MEDICINE, GENERAL & INTERNAL","Score":null,"Total":0}
Optimizing Stroke Risk Prediction: A Primary Dataset-Driven Ensemble Classifier With Explainable Artificial Intelligence
Background and Aims
Stroke remains a leading cause of mortality and long-term disability worldwide, presenting a significant global health challenge. Effective early prediction models are essential for reducing its impact. This study introduces a novel ensemble method for predicting stroke using two datasets: a primary dataset collected from a hospital, containing medical histories and clinical parameters, and a secondary dataset.
Methods
We applied several preprocessing techniques, including outlier detection, data normalization, k-means clustering, and missing value detection, to refine the datasets. A novel ensemble classifier was developed, combining AdaBoost, Gradient Boosting Machine (GBM), Multilayer Perceptron (MLP), and Random Forest (RF) algorithms to enhance predictive accuracy. Additionally, Explainable Artificial Intelligence (XAI) techniques such as SHAP and LIME were integrated to elucidate key features influencing stroke prediction.
Results
The proposed ensemble classifier achieved an accuracy of 95% for the secondary dataset and 80.36% for the primary dataset. Comparative analysis with other machine learning models highlighted the superior performance of the ensemble approach. The integration of XAI further provided insights into the critical indicators influencing stroke classification, improving model interpretability and decision-making.
Conclusion
Our study demonstrates that the novel ensemble classifier, supported by effective preprocessing and XAI techniques, is a powerful tool for stroke prediction. The high accuracy rates achieved validate its effectiveness and potential for practical clinical application. Future work will focus on incorporating deep learning techniques and medical imaging to further improve classification accuracy and model performance.