Revolutionizing heart attack prognosis: Introducing an innovative regression model for prediction

Q1 Medicine

Informatics in Medicine Unlocked Pub Date : 2025-01-01 DOI:10.1016/j.imu.2025.101664

Hanaa Albanna , Madhav Raj Theeng Tamang , Chandan Patel , Mhd Saeed Sharif

{"title":"Revolutionizing heart attack prognosis: Introducing an innovative regression model for prediction","authors":"Hanaa Albanna , Madhav Raj Theeng Tamang , Chandan Patel , Mhd Saeed Sharif","doi":"10.1016/j.imu.2025.101664","DOIUrl":null,"url":null,"abstract":"<div><h3>Objective:</h3><div>Heart attack prediction using machine learning is crucial for preemptive action and personalized healthcare. This research aims to predict heart attacks by employing machine learning in healthcare using a diverse range of patient data-including demographic, lifestyle, and physiological factors, which helps to create robust and generalizable predictions. Besides this, various models that balance accuracy with interpretability have been presented, emphasizing early detection and proactive intervention. It is expected that this cross-disciplinary approach will underline the role of machine learning in the mitigation of the heart disease burden and optimization of resources spent on healthcare.</div></div><div><h3>Methods:</h3><div>This study explores the application of machine learning techniques for predicting heart attack risk using structured clinical data. A range of classification models — Logistic Regression (LR), Support Vector Machine (SVM), Random Forest (RF), and Decision Tree (DT) — were selected based on their proven effectiveness in prior healthcare prediction studies and their balance between accuracy and interpretability. The methodology involved comprehensive data preprocessing, class imbalance handling, and hyperparameter tuning to optimize model performance. Performance metrics included Accuracy, Precision, Recall, F1-score, and AUC-ROC. Exploratory Data Analysis (EDA) was conducted to assess the role of variables such as BMI, age, and glucose levels in predicting stroke, a proxy used for heart attack due to dataset limitations.</div></div><div><h3>Results:</h3><div>The SVM and LR models achieved the highest accuracy (95.08%), followed by RF (94.86%) and DT (91.46%). Despite high accuracy, key challenges were observed:</div><div>Class Imbalance: Only 249 cases in the dataset represented positive stroke outcomes, resulting in poor recall for minority class predictions. This reduced the model’s sensitivity to actual stroke cases, a significant limitation in clinical scenarios where false negatives can be life-threatening.</div><div>Data-Label Inconsistency: Although the study is framed as predicting heart attacks, the dataset pertains to stroke prediction. This misalignment creates confusion in the clinical relevance of the findings and weakens the generalizability of the models for heart attack risk assessment.</div><div>Lack of Model Interpretability in Practice: Though LIME and SHAP were cited as tools to ensure model transparency, they were not implemented or evaluated. This limits clinicians’ trust in the model’s predictions—an essential factor for real-world adoption.</div></div><div><h3>Conclusion:</h3><div>This research shows how machine learning can play a meaningful role in improving how we predict heart attacks and ultimately help improve patient care. The results demonstrated that even well-known models like Support Vector Machine and Logistic Regression can perform very well when applied to structured health data. It also became clear that everyday variables — such as age, BMI, glucose levels, and smoking habits — carry important signals for assessing cardiovascular risk. But while the models achieved high accuracy, the study also revealed that performance alone is not enough for real-world use. For machine learning to be truly useful in healthcare, models need to handle imbalanced data properly, offer transparent and understandable predictions, and stay aligned with clinical needs. This work not only highlights the potential of AI to transform predictive healthcare but also reminds us of the practical challenges that must be addressed along the way. Clear goals, interpretable results, and thoughtful integration into clinical practice are all key to making these tools safe, effective, and trusted by healthcare professionals.</div></div>","PeriodicalId":13953,"journal":{"name":"Informatics in Medicine Unlocked","volume":"57 ","pages":"Article 101664"},"PeriodicalIF":0.0000,"publicationDate":"2025-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Informatics in Medicine Unlocked","FirstCategoryId":"1085","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S2352914825000528","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"Medicine","Score":null,"Total":0}

引用次数: 0

Abstract

Objective:

Heart attack prediction using machine learning is crucial for preemptive action and personalized healthcare. This research aims to predict heart attacks by employing machine learning in healthcare using a diverse range of patient data-including demographic, lifestyle, and physiological factors, which helps to create robust and generalizable predictions. Besides this, various models that balance accuracy with interpretability have been presented, emphasizing early detection and proactive intervention. It is expected that this cross-disciplinary approach will underline the role of machine learning in the mitigation of the heart disease burden and optimization of resources spent on healthcare.

Methods:

This study explores the application of machine learning techniques for predicting heart attack risk using structured clinical data. A range of classification models — Logistic Regression (LR), Support Vector Machine (SVM), Random Forest (RF), and Decision Tree (DT) — were selected based on their proven effectiveness in prior healthcare prediction studies and their balance between accuracy and interpretability. The methodology involved comprehensive data preprocessing, class imbalance handling, and hyperparameter tuning to optimize model performance. Performance metrics included Accuracy, Precision, Recall, F1-score, and AUC-ROC. Exploratory Data Analysis (EDA) was conducted to assess the role of variables such as BMI, age, and glucose levels in predicting stroke, a proxy used for heart attack due to dataset limitations.

Results:

The SVM and LR models achieved the highest accuracy (95.08%), followed by RF (94.86%) and DT (91.46%). Despite high accuracy, key challenges were observed:

Class Imbalance: Only 249 cases in the dataset represented positive stroke outcomes, resulting in poor recall for minority class predictions. This reduced the model’s sensitivity to actual stroke cases, a significant limitation in clinical scenarios where false negatives can be life-threatening.

Data-Label Inconsistency: Although the study is framed as predicting heart attacks, the dataset pertains to stroke prediction. This misalignment creates confusion in the clinical relevance of the findings and weakens the generalizability of the models for heart attack risk assessment.

Lack of Model Interpretability in Practice: Though LIME and SHAP were cited as tools to ensure model transparency, they were not implemented or evaluated. This limits clinicians’ trust in the model’s predictions—an essential factor for real-world adoption.

Conclusion:

This research shows how machine learning can play a meaningful role in improving how we predict heart attacks and ultimately help improve patient care. The results demonstrated that even well-known models like Support Vector Machine and Logistic Regression can perform very well when applied to structured health data. It also became clear that everyday variables — such as age, BMI, glucose levels, and smoking habits — carry important signals for assessing cardiovascular risk. But while the models achieved high accuracy, the study also revealed that performance alone is not enough for real-world use. For machine learning to be truly useful in healthcare, models need to handle imbalanced data properly, offer transparent and understandable predictions, and stay aligned with clinical needs. This work not only highlights the potential of AI to transform predictive healthcare but also reminds us of the practical challenges that must be addressed along the way. Clear goals, interpretable results, and thoughtful integration into clinical practice are all key to making these tools safe, effective, and trusted by healthcare professionals.

Abstract Image

查看原文本刊更多论文

革命性的心脏病预测：引入一种创新的预测回归模型

目的：利用机器学习预测心脏病发作对预防行动和个性化医疗至关重要。本研究旨在通过使用各种患者数据（包括人口统计、生活方式和生理因素）在医疗保健中使用机器学习来预测心脏病发作，这有助于创建稳健且可推广的预测。除此之外，还提出了各种平衡准确性和可解释性的模型，强调早期发现和主动干预。预计这种跨学科方法将强调机器学习在减轻心脏病负担和优化医疗保健资源方面的作用。方法：本研究探讨了机器学习技术在利用结构化临床数据预测心脏病发作风险中的应用。选择了一系列分类模型——逻辑回归（LR）、支持向量机（SVM）、随机森林（RF）和决策树（DT）——基于它们在先前的医疗预测研究中被证明的有效性以及它们在准确性和可解释性之间的平衡。该方法包括全面的数据预处理、类不平衡处理和超参数调优以优化模型性能。性能指标包括准确率、精密度、召回率、f1评分和AUC-ROC。探索性数据分析（EDA）是为了评估BMI、年龄和血糖水平等变量在预测中风中的作用，由于数据集的限制，中风是心脏病发作的一个替代指标。结果：SVM和LR模型准确率最高（95.08%），其次是RF（94.86%）和DT（91.46%）。尽管准确率很高，但我们也观察到了一些关键的挑战：类别不平衡：数据集中只有249个病例代表了积极的中风结果，导致少数类别预测的召回率很低。这降低了模型对实际中风病例的敏感性，在临床情况下，假阴性可能危及生命，这是一个重大限制。数据标签不一致：虽然该研究的框架是预测心脏病发作，但数据集属于中风预测。这种不一致造成了研究结果临床相关性的混乱，并削弱了心脏病发作风险评估模型的普遍性。实践中缺乏模型可解释性：尽管LIME和SHAP被引用为确保模型透明度的工具，但它们没有得到实施或评估。这限制了临床医生对模型预测的信任，而这是在现实世界中采用该模型的一个重要因素。结论：这项研究表明，机器学习可以在改善我们预测心脏病发作的方式并最终帮助改善患者护理方面发挥有意义的作用。结果表明，即使是众所周知的模型，如支持向量机和逻辑回归，也可以很好地应用于结构化健康数据。同样清楚的是，日常变量——如年龄、身体质量指数、血糖水平和吸烟习惯——都是评估心血管风险的重要信号。但是，虽然这些模型达到了很高的准确性，但研究也表明，仅凭性能还不足以用于现实世界。为了使机器学习在医疗保健中真正有用，模型需要正确处理不平衡的数据，提供透明和可理解的预测，并与临床需求保持一致。这项工作不仅突出了人工智能在改变预测性医疗保健方面的潜力，也提醒了我们在此过程中必须解决的实际挑战。明确的目标、可解释的结果以及对临床实践的周到整合是使这些工具安全、有效并获得医疗保健专业人员信任的关键。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Informatics in Medicine Unlocked Medicine-Health Informatics

CiteScore

9.50

自引率

0.00%

发文量

282

审稿时长

39 days

期刊介绍： Informatics in Medicine Unlocked (IMU) is an international gold open access journal covering a broad spectrum of topics within medical informatics, including (but not limited to) papers focusing on imaging, pathology, teledermatology, public health, ophthalmological, nursing and translational medicine informatics. The full papers that are published in the journal are accessible to all who visit the website.