数字之外：应用程序支持的中风预测系统，用于不平衡数据集中的高风险人群

Neuroscience informatics Pub Date : 2025-06-18 DOI:10.1016/j.neuri.2025.100215

Abrar Faiaz Eram , Aliva Sadnim Mahmud , Marwan Mostafa Khadem , Md Amimul Ihsan

{"title":"数字之外：应用程序支持的中风预测系统，用于不平衡数据集中的高风险人群","authors":"Abrar Faiaz Eram , Aliva Sadnim Mahmud , Marwan Mostafa Khadem , Md Amimul Ihsan","doi":"10.1016/j.neuri.2025.100215","DOIUrl":null,"url":null,"abstract":"<div><h3>Background:</h3><div>Brain stroke, characterized by interrupted blood flow to the brain, poses significant mortality risks and quality-of-life impacts. While machine learning approaches show promise in stroke prediction, current research often relies on synthetic data to address dataset imbalance, potentially compromising real-world model performance in clinical settings.</div></div><div><h3>Method:</h3><div>This research proposes an alternative approach focusing on recall as the primary evaluation metric for stroke prediction, specifically targeting the reduction of false negatives. In the context of stroke diagnosis, where missed detection can lead to severe consequences or fatality, recall is crucial for directly measuring the model's ability to identify actual stroke cases.</div></div><div><h3>Results:</h3><div>Three superior models were identified: Logistic Regression, an Ensemble using Soft Voting (combining Gaussian Naive Bayes and Logistic Regression), and customized Support Vector Machine. Exceptional stroke prediction was achieved with recall values of 92%, 92%, and 94%, respectively. Interpretability is enhanced through SHAP applied to the best one. While previous methods showed recall values between 5.6% and 40%, this approach outperformed these benchmarks (94%). Current research emphasizes accuracy metrics, relying on oversampling, being inappropriate for sensitive medical datasets. The pitfall is a slight increase in false positives, which is tolerable because the cost of misdiagnosing a stroke patient far outweighs the reverse scenario.</div></div><div><h3>Conclusions:</h3><div>The research demonstrates the effectiveness of focusing on recall as an evaluation metric for stroke prediction, minimizing false negative predictions. To facilitate practical implementation, a mobile application incorporating the best-performing model was included. A primary screening which can supplement doctors in stroke diagnosis and prediction was proposed.</div></div>","PeriodicalId":74295,"journal":{"name":"Neuroscience informatics","volume":"5 3","pages":"Article 100215"},"PeriodicalIF":0.0000,"publicationDate":"2025-06-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Beyond the numbers: App-enabled stroke prediction system for high-risk individuals in imbalanced datasets\",\"authors\":\"Abrar Faiaz Eram , Aliva Sadnim Mahmud , Marwan Mostafa Khadem , Md Amimul Ihsan\",\"doi\":\"10.1016/j.neuri.2025.100215\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><h3>Background:</h3><div>Brain stroke, characterized by interrupted blood flow to the brain, poses significant mortality risks and quality-of-life impacts. While machine learning approaches show promise in stroke prediction, current research often relies on synthetic data to address dataset imbalance, potentially compromising real-world model performance in clinical settings.</div></div><div><h3>Method:</h3><div>This research proposes an alternative approach focusing on recall as the primary evaluation metric for stroke prediction, specifically targeting the reduction of false negatives. In the context of stroke diagnosis, where missed detection can lead to severe consequences or fatality, recall is crucial for directly measuring the model's ability to identify actual stroke cases.</div></div><div><h3>Results:</h3><div>Three superior models were identified: Logistic Regression, an Ensemble using Soft Voting (combining Gaussian Naive Bayes and Logistic Regression), and customized Support Vector Machine. Exceptional stroke prediction was achieved with recall values of 92%, 92%, and 94%, respectively. Interpretability is enhanced through SHAP applied to the best one. While previous methods showed recall values between 5.6% and 40%, this approach outperformed these benchmarks (94%). Current research emphasizes accuracy metrics, relying on oversampling, being inappropriate for sensitive medical datasets. The pitfall is a slight increase in false positives, which is tolerable because the cost of misdiagnosing a stroke patient far outweighs the reverse scenario.</div></div><div><h3>Conclusions:</h3><div>The research demonstrates the effectiveness of focusing on recall as an evaluation metric for stroke prediction, minimizing false negative predictions. To facilitate practical implementation, a mobile application incorporating the best-performing model was included. A primary screening which can supplement doctors in stroke diagnosis and prediction was proposed.</div></div>\",\"PeriodicalId\":74295,\"journal\":{\"name\":\"Neuroscience informatics\",\"volume\":\"5 3\",\"pages\":\"Article 100215\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2025-06-18\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Neuroscience informatics\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S2772528625000305\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Neuroscience informatics","FirstCategoryId":"1085","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S2772528625000305","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

摘要

背景：脑中风以脑部血流中断为特征，具有显著的死亡风险和生活质量影响。虽然机器学习方法在中风预测方面显示出前景，但目前的研究往往依赖于合成数据来解决数据集失衡问题，这可能会影响临床环境中真实世界模型的性能。方法：本研究提出了一种替代方法，将召回率作为中风预测的主要评估指标，特别是针对减少假阴性。在中风诊断的背景下，遗漏的检测可能导致严重的后果或死亡，召回对于直接测量模型识别实际中风病例的能力至关重要。结果：确定了三种优越的模型：逻辑回归、软投票集成（结合高斯朴素贝叶斯和逻辑回归）和定制支持向量机。异常脑卒中预测的召回率分别为92%、92%和94%。可解释性通过将SHAP应用于最好的代码而得到增强。虽然以前的方法显示召回值在5.6%到40%之间，但这种方法优于这些基准（94%）。目前的研究强调准确性指标，依赖于过采样，不适合敏感的医疗数据集。陷阱是假阳性的轻微增加，这是可以容忍的，因为误诊中风患者的成本远远超过相反的情况。结论：本研究证明了将回忆作为卒中预测的评估指标的有效性，最大限度地减少了错误的负面预测。为了便于实际实施，我们还提供了一个包含最佳性能模型的移动应用程序。提出了一种辅助医生进行脑卒中诊断和预测的初步筛查方法。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

Beyond the numbers: App-enabled stroke prediction system for high-risk individuals in imbalanced datasets

查看原文本刊更多论文

Beyond the numbers: App-enabled stroke prediction system for high-risk individuals in imbalanced datasets

Background:

Brain stroke, characterized by interrupted blood flow to the brain, poses significant mortality risks and quality-of-life impacts. While machine learning approaches show promise in stroke prediction, current research often relies on synthetic data to address dataset imbalance, potentially compromising real-world model performance in clinical settings.

Method:

This research proposes an alternative approach focusing on recall as the primary evaluation metric for stroke prediction, specifically targeting the reduction of false negatives. In the context of stroke diagnosis, where missed detection can lead to severe consequences or fatality, recall is crucial for directly measuring the model's ability to identify actual stroke cases.

Results:

Three superior models were identified: Logistic Regression, an Ensemble using Soft Voting (combining Gaussian Naive Bayes and Logistic Regression), and customized Support Vector Machine. Exceptional stroke prediction was achieved with recall values of 92%, 92%, and 94%, respectively. Interpretability is enhanced through SHAP applied to the best one. While previous methods showed recall values between 5.6% and 40%, this approach outperformed these benchmarks (94%). Current research emphasizes accuracy metrics, relying on oversampling, being inappropriate for sensitive medical datasets. The pitfall is a slight increase in false positives, which is tolerable because the cost of misdiagnosing a stroke patient far outweighs the reverse scenario.

Conclusions:

The research demonstrates the effectiveness of focusing on recall as an evaluation metric for stroke prediction, minimizing false negative predictions. To facilitate practical implementation, a mobile application incorporating the best-performing model was included. A primary screening which can supplement doctors in stroke diagnosis and prediction was proposed.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Neuroscience informatics Surgery, Radiology and Imaging, Information Systems, Neurology, Artificial Intelligence, Computer Science Applications, Signal Processing, Critical Care and Intensive Care Medicine, Health Informatics, Clinical Neurology, Pathology and Medical Technology

自引率

0.00%

发文量

审稿时长

57 days