Abrar Faiaz Eram , Aliva Sadnim Mahmud , Marwan Mostafa Khadem , Md Amimul Ihsan
{"title":"数字之外:应用程序支持的中风预测系统,用于不平衡数据集中的高风险人群","authors":"Abrar Faiaz Eram , Aliva Sadnim Mahmud , Marwan Mostafa Khadem , Md Amimul Ihsan","doi":"10.1016/j.neuri.2025.100215","DOIUrl":null,"url":null,"abstract":"<div><h3>Background:</h3><div>Brain stroke, characterized by interrupted blood flow to the brain, poses significant mortality risks and quality-of-life impacts. While machine learning approaches show promise in stroke prediction, current research often relies on synthetic data to address dataset imbalance, potentially compromising real-world model performance in clinical settings.</div></div><div><h3>Method:</h3><div>This research proposes an alternative approach focusing on recall as the primary evaluation metric for stroke prediction, specifically targeting the reduction of false negatives. In the context of stroke diagnosis, where missed detection can lead to severe consequences or fatality, recall is crucial for directly measuring the model's ability to identify actual stroke cases.</div></div><div><h3>Results:</h3><div>Three superior models were identified: Logistic Regression, an Ensemble using Soft Voting (combining Gaussian Naive Bayes and Logistic Regression), and customized Support Vector Machine. Exceptional stroke prediction was achieved with recall values of 92%, 92%, and 94%, respectively. Interpretability is enhanced through SHAP applied to the best one. While previous methods showed recall values between 5.6% and 40%, this approach outperformed these benchmarks (94%). Current research emphasizes accuracy metrics, relying on oversampling, being inappropriate for sensitive medical datasets. The pitfall is a slight increase in false positives, which is tolerable because the cost of misdiagnosing a stroke patient far outweighs the reverse scenario.</div></div><div><h3>Conclusions:</h3><div>The research demonstrates the effectiveness of focusing on recall as an evaluation metric for stroke prediction, minimizing false negative predictions. To facilitate practical implementation, a mobile application incorporating the best-performing model was included. A primary screening which can supplement doctors in stroke diagnosis and prediction was proposed.</div></div>","PeriodicalId":74295,"journal":{"name":"Neuroscience informatics","volume":"5 3","pages":"Article 100215"},"PeriodicalIF":0.0000,"publicationDate":"2025-06-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Beyond the numbers: App-enabled stroke prediction system for high-risk individuals in imbalanced datasets\",\"authors\":\"Abrar Faiaz Eram , Aliva Sadnim Mahmud , Marwan Mostafa Khadem , Md Amimul Ihsan\",\"doi\":\"10.1016/j.neuri.2025.100215\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><h3>Background:</h3><div>Brain stroke, characterized by interrupted blood flow to the brain, poses significant mortality risks and quality-of-life impacts. While machine learning approaches show promise in stroke prediction, current research often relies on synthetic data to address dataset imbalance, potentially compromising real-world model performance in clinical settings.</div></div><div><h3>Method:</h3><div>This research proposes an alternative approach focusing on recall as the primary evaluation metric for stroke prediction, specifically targeting the reduction of false negatives. In the context of stroke diagnosis, where missed detection can lead to severe consequences or fatality, recall is crucial for directly measuring the model's ability to identify actual stroke cases.</div></div><div><h3>Results:</h3><div>Three superior models were identified: Logistic Regression, an Ensemble using Soft Voting (combining Gaussian Naive Bayes and Logistic Regression), and customized Support Vector Machine. Exceptional stroke prediction was achieved with recall values of 92%, 92%, and 94%, respectively. Interpretability is enhanced through SHAP applied to the best one. While previous methods showed recall values between 5.6% and 40%, this approach outperformed these benchmarks (94%). Current research emphasizes accuracy metrics, relying on oversampling, being inappropriate for sensitive medical datasets. The pitfall is a slight increase in false positives, which is tolerable because the cost of misdiagnosing a stroke patient far outweighs the reverse scenario.</div></div><div><h3>Conclusions:</h3><div>The research demonstrates the effectiveness of focusing on recall as an evaluation metric for stroke prediction, minimizing false negative predictions. To facilitate practical implementation, a mobile application incorporating the best-performing model was included. A primary screening which can supplement doctors in stroke diagnosis and prediction was proposed.</div></div>\",\"PeriodicalId\":74295,\"journal\":{\"name\":\"Neuroscience informatics\",\"volume\":\"5 3\",\"pages\":\"Article 100215\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2025-06-18\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Neuroscience informatics\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S2772528625000305\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Neuroscience informatics","FirstCategoryId":"1085","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S2772528625000305","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Beyond the numbers: App-enabled stroke prediction system for high-risk individuals in imbalanced datasets
Background:
Brain stroke, characterized by interrupted blood flow to the brain, poses significant mortality risks and quality-of-life impacts. While machine learning approaches show promise in stroke prediction, current research often relies on synthetic data to address dataset imbalance, potentially compromising real-world model performance in clinical settings.
Method:
This research proposes an alternative approach focusing on recall as the primary evaluation metric for stroke prediction, specifically targeting the reduction of false negatives. In the context of stroke diagnosis, where missed detection can lead to severe consequences or fatality, recall is crucial for directly measuring the model's ability to identify actual stroke cases.
Results:
Three superior models were identified: Logistic Regression, an Ensemble using Soft Voting (combining Gaussian Naive Bayes and Logistic Regression), and customized Support Vector Machine. Exceptional stroke prediction was achieved with recall values of 92%, 92%, and 94%, respectively. Interpretability is enhanced through SHAP applied to the best one. While previous methods showed recall values between 5.6% and 40%, this approach outperformed these benchmarks (94%). Current research emphasizes accuracy metrics, relying on oversampling, being inappropriate for sensitive medical datasets. The pitfall is a slight increase in false positives, which is tolerable because the cost of misdiagnosing a stroke patient far outweighs the reverse scenario.
Conclusions:
The research demonstrates the effectiveness of focusing on recall as an evaluation metric for stroke prediction, minimizing false negative predictions. To facilitate practical implementation, a mobile application incorporating the best-performing model was included. A primary screening which can supplement doctors in stroke diagnosis and prediction was proposed.
Neuroscience informaticsSurgery, Radiology and Imaging, Information Systems, Neurology, Artificial Intelligence, Computer Science Applications, Signal Processing, Critical Care and Intensive Care Medicine, Health Informatics, Clinical Neurology, Pathology and Medical Technology