预测心血管疾病的监督学习算法比较分析。

IF 1.8 4区医学 Q4 ENGINEERING, BIOMEDICAL

Technology and Health Care Pub Date : 2024-01-01 DOI:10.3233/THC-248021

Yifeng Dou, Jiantao Liu, Wentao Meng, Yingchao Zhang

{"title":"预测心血管疾病的监督学习算法比较分析。","authors":"Yifeng Dou, Jiantao Liu, Wentao Meng, Yingchao Zhang","doi":"10.3233/THC-248021","DOIUrl":null,"url":null,"abstract":"Background: With the advent of artificial intelligence technology, machine learning algorithms have been widely used in the area of disease prediction.Objective: Cardiovascular disease (CVD) seriously jeopardizes human health worldwide, thereby needing the establishment of an effective CVD prediction model that can be of great significance for controlling the risk of the disease and safeguarding the physical and mental health of the population.Methods: Considering the UCI heart disease dataset as an example, initially, a single machine learning prediction model was constructed. Subsequently, six methods such as Pearson, chi-squared, RFE and LightGBM were comprehensively used for the feature screening. On the basis of the base classifiers, Soft Voting fusion and Stacking fusion was carried out to build a prediction model for cardiovascular diseases, in order to realize an early warning and disease intervention for high-risk populations. To address the data imbalance problem, the SMOTE method was adopted to process the data set, and the prediction effect of the model was analyzed using multi-dimensional and multi-indicators.Results: In the single classifier model, the MLP algorithm performed optimally on the preprocessed heart disease dataset. After feature selection, five features eliminated. The ENSEM_SV algorithm that combines the base classifiers to determine the prediction results by soft voting on the results of the classifiers achieved the optimal value on five metrics such as Accuracy, Jaccard_Score, Hamm_Loss, AUC, etc., and the AUC value reached 0.951. The RF, ET, GBDT, and LGB algorithms were employed in the first stage sub-model composed of base classifiers. The AB algorithm was selected as the second stage model, and the ensemble algorithm ENSEM_ST, obtained by Stacking fusion of the two stages exhibited the best performance on 7 indicators such as Accuracy, Sensitivity, F1_Score, Mathew_Corrcoef, etc., and the AUC reached 0.952. Furthermore, a comparison of the algorithms' classification effects based on different training set occupancy was carried out. The results indicated that the prediction performance of both the fusion models was better than the single models, and the overall effect of ENSEM_ST fusion was stronger than the ENSEM_SV fusion.Conclusions: The fusion model established in this study improved the overall classification accuracy and stability of the model to a significant extent. It has a good application value in the predictive analysis of CVD diagnosis, and can provide a valuable reference in the disease diagnosis and intervention strategies.","PeriodicalId":48978,"journal":{"name":"Technology and Health Care","volume":" ","pages":"241-251"},"PeriodicalIF":1.8000,"publicationDate":"2024-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11191474/pdf/","citationCount":"0","resultStr":"{\"title\":\"Comparative analysis of supervised learning algorithms for prediction of cardiovascular diseases.\",\"authors\":\"Yifeng Dou, Jiantao Liu, Wentao Meng, Yingchao Zhang\",\"doi\":\"10.3233/THC-248021\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Background: With the advent of artificial intelligence technology, machine learning algorithms have been widely used in the area of disease prediction.Objective: Cardiovascular disease (CVD) seriously jeopardizes human health worldwide, thereby needing the establishment of an effective CVD prediction model that can be of great significance for controlling the risk of the disease and safeguarding the physical and mental health of the population.Methods: Considering the UCI heart disease dataset as an example, initially, a single machine learning prediction model was constructed. Subsequently, six methods such as Pearson, chi-squared, RFE and LightGBM were comprehensively used for the feature screening. On the basis of the base classifiers, Soft Voting fusion and Stacking fusion was carried out to build a prediction model for cardiovascular diseases, in order to realize an early warning and disease intervention for high-risk populations. To address the data imbalance problem, the SMOTE method was adopted to process the data set, and the prediction effect of the model was analyzed using multi-dimensional and multi-indicators.Results: In the single classifier model, the MLP algorithm performed optimally on the preprocessed heart disease dataset. After feature selection, five features eliminated. The ENSEM_SV algorithm that combines the base classifiers to determine the prediction results by soft voting on the results of the classifiers achieved the optimal value on five metrics such as Accuracy, Jaccard_Score, Hamm_Loss, AUC, etc., and the AUC value reached 0.951. The RF, ET, GBDT, and LGB algorithms were employed in the first stage sub-model composed of base classifiers. The AB algorithm was selected as the second stage model, and the ensemble algorithm ENSEM_ST, obtained by Stacking fusion of the two stages exhibited the best performance on 7 indicators such as Accuracy, Sensitivity, F1_Score, Mathew_Corrcoef, etc., and the AUC reached 0.952. Furthermore, a comparison of the algorithms' classification effects based on different training set occupancy was carried out. The results indicated that the prediction performance of both the fusion models was better than the single models, and the overall effect of ENSEM_ST fusion was stronger than the ENSEM_SV fusion.Conclusions: The fusion model established in this study improved the overall classification accuracy and stability of the model to a significant extent. It has a good application value in the predictive analysis of CVD diagnosis, and can provide a valuable reference in the disease diagnosis and intervention strategies.\",\"PeriodicalId\":48978,\"journal\":{\"name\":\"Technology and Health Care\",\"volume\":\" \",\"pages\":\"241-251\"},\"PeriodicalIF\":1.8000,\"publicationDate\":\"2024-01-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11191474/pdf/\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Technology and Health Care\",\"FirstCategoryId\":\"5\",\"ListUrlMain\":\"https://doi.org/10.3233/THC-248021\",\"RegionNum\":4,\"RegionCategory\":\"医学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q4\",\"JCRName\":\"ENGINEERING, BIOMEDICAL\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Technology and Health Care","FirstCategoryId":"5","ListUrlMain":"https://doi.org/10.3233/THC-248021","RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q4","JCRName":"ENGINEERING, BIOMEDICAL","Score":null,"Total":0}

引用次数: 0

摘要

背景：随着人工智能技术的出现，机器学习算法被广泛应用于疾病预测领域：随着人工智能技术的发展，机器学习算法已广泛应用于疾病预测领域：心血管疾病（CVD）严重危害全球人类健康，因此需要建立有效的心血管疾病预测模型，这对控制疾病风险、保障人群身心健康具有重要意义：方法：以 UCI 心脏病数据集为例，首先构建了一个单一的机器学习预测模型。方法：以 UCI 心脏病数据集为例，首先构建了单一的机器学习预测模型，然后综合使用了皮尔逊、卡方、RFE、LightGBM 等六种方法进行特征筛选。在基础分类器的基础上，进行软投票融合和堆叠融合，构建心血管疾病的预测模型，从而实现对高危人群的预警和疾病干预。针对数据不平衡问题，采用SMOTE方法处理数据集，并利用多维度、多指标分析模型的预测效果：在单一分类器模型中，MLP 算法在预处理后的心脏病数据集上表现最佳。经过特征选择后，有五个特征被剔除。结合基础分类器的 ENSEM_SV 算法通过对分类器结果进行软投票来确定预测结果，在准确率、Jaccard_Score、Hamm_Loss、AUC 等五个指标上都达到了最优值，AUC 值达到了 0.951。在由基础分类器组成的第一阶段子模型中，采用了 RF、ET、GBDT 和 LGB 算法。选择 AB 算法作为第二阶段模型，两阶段堆叠融合得到的集合算法 ENSEM_ST 在准确度、灵敏度、F1_Score、Mathew_Corrcoef 等 7 项指标上表现最佳，AUC 达到 0.952。此外，还对基于不同训练集占用率的算法分类效果进行了比较。结果表明，两种融合模型的预测性能均优于单一模型，且ENSEM_ST融合的整体效果强于ENSEM_SV融合：结论：本研究建立的融合模型在很大程度上提高了模型的整体分类准确性和稳定性。结论：本研究建立的融合模型在很大程度上提高了模型的整体分类准确性和稳定性，在心血管疾病诊断的预测分析中具有很好的应用价值，可为疾病诊断和干预策略提供有价值的参考。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

Comparative analysis of supervised learning algorithms for prediction of cardiovascular diseases.

查看原文本刊更多论文

Comparative analysis of supervised learning algorithms for prediction of cardiovascular diseases.

Background: With the advent of artificial intelligence technology, machine learning algorithms have been widely used in the area of disease prediction.

Objective: Cardiovascular disease (CVD) seriously jeopardizes human health worldwide, thereby needing the establishment of an effective CVD prediction model that can be of great significance for controlling the risk of the disease and safeguarding the physical and mental health of the population.

Methods: Considering the UCI heart disease dataset as an example, initially, a single machine learning prediction model was constructed. Subsequently, six methods such as Pearson, chi-squared, RFE and LightGBM were comprehensively used for the feature screening. On the basis of the base classifiers, Soft Voting fusion and Stacking fusion was carried out to build a prediction model for cardiovascular diseases, in order to realize an early warning and disease intervention for high-risk populations. To address the data imbalance problem, the SMOTE method was adopted to process the data set, and the prediction effect of the model was analyzed using multi-dimensional and multi-indicators.

Results: In the single classifier model, the MLP algorithm performed optimally on the preprocessed heart disease dataset. After feature selection, five features eliminated. The ENSEM_SV algorithm that combines the base classifiers to determine the prediction results by soft voting on the results of the classifiers achieved the optimal value on five metrics such as Accuracy, Jaccard_Score, Hamm_Loss, AUC, etc., and the AUC value reached 0.951. The RF, ET, GBDT, and LGB algorithms were employed in the first stage sub-model composed of base classifiers. The AB algorithm was selected as the second stage model, and the ensemble algorithm ENSEM_ST, obtained by Stacking fusion of the two stages exhibited the best performance on 7 indicators such as Accuracy, Sensitivity, F1_Score, Mathew_Corrcoef, etc., and the AUC reached 0.952. Furthermore, a comparison of the algorithms' classification effects based on different training set occupancy was carried out. The results indicated that the prediction performance of both the fusion models was better than the single models, and the overall effect of ENSEM_ST fusion was stronger than the ENSEM_SV fusion.

Conclusions: The fusion model established in this study improved the overall classification accuracy and stability of the model to a significant extent. It has a good application value in the predictive analysis of CVD diagnosis, and can provide a valuable reference in the disease diagnosis and intervention strategies.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Technology and Health Care HEALTH CARE SCIENCES & SERVICES-ENGINEERING, BIOMEDICAL

CiteScore

2.10

自引率

6.20%

发文量

282

审稿时长

>12 weeks

期刊介绍： Technology and Health Care is intended to serve as a forum for the presentation of original articles and technical notes, observing rigorous scientific standards. Furthermore, upon invitation, reviews, tutorials, discussion papers and minisymposia are featured. The main focus of THC is related to the overlapping areas of engineering and medicine. The following types of contributions are considered: 1.Original articles: New concepts, procedures and devices associated with the use of technology in medical research and clinical practice are presented to a readership with a widespread background in engineering and/or medicine. In particular, the clinical benefit deriving from the application of engineering methods and devices in clinical medicine should be demonstrated. Typically, full length original contributions have a length of 4000 words, thereby taking duly into account figures and tables. 2.Technical Notes and Short Communications: Technical Notes relate to novel technical developments with relevance for clinical medicine. In Short Communications, clinical applications are shortly described. 3.Both Technical Notes and Short Communications typically have a length of 1500 words. Reviews and Tutorials (upon invitation only): Tutorial and educational articles for persons with a primarily medical background on principles of engineering with particular significance for biomedical applications and vice versa are presented. The Editorial Board is responsible for the selection of topics. 4.Minisymposia (upon invitation only): Under the leadership of a Special Editor, controversial or important issues relating to health care are highlighted and discussed by various authors. 5.Letters to the Editors: Discussions or short statements (not indexed).