{"title":"Transfer learning-based hybrid VGG16-machine learning approach for heart disease detection with explainable artificial intelligence.","authors":"Eshetie Gizachew Addisu, Tahayu Gizachew Yirga, Hailu Gizachew Yirga, Alemu Demeke Yehuala","doi":"10.3389/frai.2025.1504281","DOIUrl":null,"url":null,"abstract":"<p><p>Heart disease is a leading cause of mortality worldwide, making accurate early detection essential for effective treatment and management. This study introduces a novel hybrid machine-learning approach that combines transfer learning using the VGG16 convolutional neural network (CNN) with various machine-learning classifiers for heart disease detection. A conditional tabular generative adversarial network (CTGAN) was employed to generate synthetic data samples from actual datasets; these were evaluated using statistical metrics, correlation analysis, and domain expert assessments to ensure the quality of the synthetic datasets. The dataset comprises tabular data with 13 features, which are reshaped into an image-like format and resized to 224x224x3 to meet the input requirements of the VGG16 model. Feature extraction is performed using VGG16, and the extracted features are then fused with the original tabular data. This combined feature set is then used to train various machine learning models, including Support Vector Machines (SVM), Gradient Boosting, Random Forest, Logistic Regression, K-nearest neighbors (KNN), and Decision Trees. Among these models, the VGG16-Random Forest hybrid achieved notable results across all evaluation metrics, including 92% accuracy, 91.3% precision, 92.2% recall, 91.82% specificity, 92.2% sensitivity, and 91.75% F1-score. The hybrid models were also evaluated using unseen datasets to assess the generalizability of the proposed approaches, with the VGG16-Random Forest combination showing relatively promising results. Additionally, explainability is integrated into the model using SHAP values, providing insights into the contribution of each feature to the model's predictions. This hybrid VGG16-ML approach demonstrates the potential for highly accurate and interpretable heart disease detection, offering valuable support in clinical decision-making processes.</p>","PeriodicalId":33315,"journal":{"name":"Frontiers in Artificial Intelligence","volume":"8 ","pages":"1504281"},"PeriodicalIF":3.0000,"publicationDate":"2025-02-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11893864/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Frontiers in Artificial Intelligence","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.3389/frai.2025.1504281","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2025/1/1 0:00:00","PubModel":"eCollection","JCR":"Q2","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 0
Abstract
Heart disease is a leading cause of mortality worldwide, making accurate early detection essential for effective treatment and management. This study introduces a novel hybrid machine-learning approach that combines transfer learning using the VGG16 convolutional neural network (CNN) with various machine-learning classifiers for heart disease detection. A conditional tabular generative adversarial network (CTGAN) was employed to generate synthetic data samples from actual datasets; these were evaluated using statistical metrics, correlation analysis, and domain expert assessments to ensure the quality of the synthetic datasets. The dataset comprises tabular data with 13 features, which are reshaped into an image-like format and resized to 224x224x3 to meet the input requirements of the VGG16 model. Feature extraction is performed using VGG16, and the extracted features are then fused with the original tabular data. This combined feature set is then used to train various machine learning models, including Support Vector Machines (SVM), Gradient Boosting, Random Forest, Logistic Regression, K-nearest neighbors (KNN), and Decision Trees. Among these models, the VGG16-Random Forest hybrid achieved notable results across all evaluation metrics, including 92% accuracy, 91.3% precision, 92.2% recall, 91.82% specificity, 92.2% sensitivity, and 91.75% F1-score. The hybrid models were also evaluated using unseen datasets to assess the generalizability of the proposed approaches, with the VGG16-Random Forest combination showing relatively promising results. Additionally, explainability is integrated into the model using SHAP values, providing insights into the contribution of each feature to the model's predictions. This hybrid VGG16-ML approach demonstrates the potential for highly accurate and interpretable heart disease detection, offering valuable support in clinical decision-making processes.