{"title":"Improving stroke risk prediction by integrating XGBoost, optimized principal component analysis, and explainable artificial intelligence.","authors":"Lesia Mochurad, Viktoriia Babii, Yuliia Boliubash, Yulianna Mochurad","doi":"10.1186/s12911-025-02894-z","DOIUrl":null,"url":null,"abstract":"<p><p>The relevance of the study is due to the growing number of diseases of the cerebrovascular system, in particular stroke, which is one of the leading causes of disability and mortality in the world. To improve stroke risk prediction models in terms of efficiency and interpretability, we propose to integrate modern machine learning algorithms and data dimensionality reduction methods, in particular XGBoost and optimized principal component analysis (PCA), which provide data structuring and increase processing speed, especially for large datasets. For the first time, explainable artificial intelligence (XAI) is integrated into the PCA process, which increases transparency and interpretation, providing a better understanding of risk factors for medical professionals. The proposed approach was tested on two datasets, with accuracy of 95% and 98%. Cross-validation yielded an average value of 0.99, and high values of Matthew's correlation coefficient (MCC) metrics of 0.96 and Cohen's Kappa (CK) of 0.96 confirmed the generalizability and reliability of the model. The processing speed is increased threefold due to OpenMP parallelization, which makes it possible to apply it in practice. Thus, the proposed method is innovative and can potentially improve forecasting systems in the healthcare industry.</p>","PeriodicalId":9340,"journal":{"name":"BMC Medical Informatics and Decision Making","volume":"25 1","pages":"63"},"PeriodicalIF":3.3000,"publicationDate":"2025-02-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"BMC Medical Informatics and Decision Making","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1186/s12911-025-02894-z","RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"MEDICAL INFORMATICS","Score":null,"Total":0}
引用次数: 0
Abstract
The relevance of the study is due to the growing number of diseases of the cerebrovascular system, in particular stroke, which is one of the leading causes of disability and mortality in the world. To improve stroke risk prediction models in terms of efficiency and interpretability, we propose to integrate modern machine learning algorithms and data dimensionality reduction methods, in particular XGBoost and optimized principal component analysis (PCA), which provide data structuring and increase processing speed, especially for large datasets. For the first time, explainable artificial intelligence (XAI) is integrated into the PCA process, which increases transparency and interpretation, providing a better understanding of risk factors for medical professionals. The proposed approach was tested on two datasets, with accuracy of 95% and 98%. Cross-validation yielded an average value of 0.99, and high values of Matthew's correlation coefficient (MCC) metrics of 0.96 and Cohen's Kappa (CK) of 0.96 confirmed the generalizability and reliability of the model. The processing speed is increased threefold due to OpenMP parallelization, which makes it possible to apply it in practice. Thus, the proposed method is innovative and can potentially improve forecasting systems in the healthcare industry.
期刊介绍:
BMC Medical Informatics and Decision Making is an open access journal publishing original peer-reviewed research articles in relation to the design, development, implementation, use, and evaluation of health information technologies and decision-making for human health.