预测心血管疾病的监督学习算法比较分析。

IF 16.4 1区 化学 Q1 CHEMISTRY, MULTIDISCIPLINARY
Yifeng Dou, Jiantao Liu, Wentao Meng, Yingchao Zhang
{"title":"预测心血管疾病的监督学习算法比较分析。","authors":"Yifeng Dou, Jiantao Liu, Wentao Meng, Yingchao Zhang","doi":"10.3233/THC-248021","DOIUrl":null,"url":null,"abstract":"<p><strong>Background: </strong>With the advent of artificial intelligence technology, machine learning algorithms have been widely used in the area of disease prediction.</p><p><strong>Objective: </strong>Cardiovascular disease (CVD) seriously jeopardizes human health worldwide, thereby needing the establishment of an effective CVD prediction model that can be of great significance for controlling the risk of the disease and safeguarding the physical and mental health of the population.</p><p><strong>Methods: </strong>Considering the UCI heart disease dataset as an example, initially, a single machine learning prediction model was constructed. Subsequently, six methods such as Pearson, chi-squared, RFE and LightGBM were comprehensively used for the feature screening. On the basis of the base classifiers, Soft Voting fusion and Stacking fusion was carried out to build a prediction model for cardiovascular diseases, in order to realize an early warning and disease intervention for high-risk populations. To address the data imbalance problem, the SMOTE method was adopted to process the data set, and the prediction effect of the model was analyzed using multi-dimensional and multi-indicators.</p><p><strong>Results: </strong>In the single classifier model, the MLP algorithm performed optimally on the preprocessed heart disease dataset. After feature selection, five features eliminated. The ENSEM_SV algorithm that combines the base classifiers to determine the prediction results by soft voting on the results of the classifiers achieved the optimal value on five metrics such as Accuracy, Jaccard_Score, Hamm_Loss, AUC, etc., and the AUC value reached 0.951. The RF, ET, GBDT, and LGB algorithms were employed in the first stage sub-model composed of base classifiers. The AB algorithm was selected as the second stage model, and the ensemble algorithm ENSEM_ST, obtained by Stacking fusion of the two stages exhibited the best performance on 7 indicators such as Accuracy, Sensitivity, F1_Score, Mathew_Corrcoef, etc., and the AUC reached 0.952. Furthermore, a comparison of the algorithms' classification effects based on different training set occupancy was carried out. The results indicated that the prediction performance of both the fusion models was better than the single models, and the overall effect of ENSEM_ST fusion was stronger than the ENSEM_SV fusion.</p><p><strong>Conclusions: </strong>The fusion model established in this study improved the overall classification accuracy and stability of the model to a significant extent. It has a good application value in the predictive analysis of CVD diagnosis, and can provide a valuable reference in the disease diagnosis and intervention strategies.</p>","PeriodicalId":1,"journal":{"name":"Accounts of Chemical Research","volume":null,"pages":null},"PeriodicalIF":16.4000,"publicationDate":"2024-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11191474/pdf/","citationCount":"0","resultStr":"{\"title\":\"Comparative analysis of supervised learning algorithms for prediction of cardiovascular diseases.\",\"authors\":\"Yifeng Dou, Jiantao Liu, Wentao Meng, Yingchao Zhang\",\"doi\":\"10.3233/THC-248021\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p><strong>Background: </strong>With the advent of artificial intelligence technology, machine learning algorithms have been widely used in the area of disease prediction.</p><p><strong>Objective: </strong>Cardiovascular disease (CVD) seriously jeopardizes human health worldwide, thereby needing the establishment of an effective CVD prediction model that can be of great significance for controlling the risk of the disease and safeguarding the physical and mental health of the population.</p><p><strong>Methods: </strong>Considering the UCI heart disease dataset as an example, initially, a single machine learning prediction model was constructed. Subsequently, six methods such as Pearson, chi-squared, RFE and LightGBM were comprehensively used for the feature screening. On the basis of the base classifiers, Soft Voting fusion and Stacking fusion was carried out to build a prediction model for cardiovascular diseases, in order to realize an early warning and disease intervention for high-risk populations. To address the data imbalance problem, the SMOTE method was adopted to process the data set, and the prediction effect of the model was analyzed using multi-dimensional and multi-indicators.</p><p><strong>Results: </strong>In the single classifier model, the MLP algorithm performed optimally on the preprocessed heart disease dataset. After feature selection, five features eliminated. The ENSEM_SV algorithm that combines the base classifiers to determine the prediction results by soft voting on the results of the classifiers achieved the optimal value on five metrics such as Accuracy, Jaccard_Score, Hamm_Loss, AUC, etc., and the AUC value reached 0.951. The RF, ET, GBDT, and LGB algorithms were employed in the first stage sub-model composed of base classifiers. The AB algorithm was selected as the second stage model, and the ensemble algorithm ENSEM_ST, obtained by Stacking fusion of the two stages exhibited the best performance on 7 indicators such as Accuracy, Sensitivity, F1_Score, Mathew_Corrcoef, etc., and the AUC reached 0.952. Furthermore, a comparison of the algorithms' classification effects based on different training set occupancy was carried out. The results indicated that the prediction performance of both the fusion models was better than the single models, and the overall effect of ENSEM_ST fusion was stronger than the ENSEM_SV fusion.</p><p><strong>Conclusions: </strong>The fusion model established in this study improved the overall classification accuracy and stability of the model to a significant extent. It has a good application value in the predictive analysis of CVD diagnosis, and can provide a valuable reference in the disease diagnosis and intervention strategies.</p>\",\"PeriodicalId\":1,\"journal\":{\"name\":\"Accounts of Chemical Research\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":16.4000,\"publicationDate\":\"2024-01-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11191474/pdf/\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Accounts of Chemical Research\",\"FirstCategoryId\":\"5\",\"ListUrlMain\":\"https://doi.org/10.3233/THC-248021\",\"RegionNum\":1,\"RegionCategory\":\"化学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"CHEMISTRY, MULTIDISCIPLINARY\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Accounts of Chemical Research","FirstCategoryId":"5","ListUrlMain":"https://doi.org/10.3233/THC-248021","RegionNum":1,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"CHEMISTRY, MULTIDISCIPLINARY","Score":null,"Total":0}
引用次数: 0

摘要

背景:随着人工智能技术的出现,机器学习算法被广泛应用于疾病预测领域:随着人工智能技术的发展,机器学习算法已广泛应用于疾病预测领域:心血管疾病(CVD)严重危害全球人类健康,因此需要建立有效的心血管疾病预测模型,这对控制疾病风险、保障人群身心健康具有重要意义:方法:以 UCI 心脏病数据集为例,首先构建了一个单一的机器学习预测模型。方法:以 UCI 心脏病数据集为例,首先构建了单一的机器学习预测模型,然后综合使用了皮尔逊、卡方、RFE、LightGBM 等六种方法进行特征筛选。在基础分类器的基础上,进行软投票融合和堆叠融合,构建心血管疾病的预测模型,从而实现对高危人群的预警和疾病干预。针对数据不平衡问题,采用SMOTE方法处理数据集,并利用多维度、多指标分析模型的预测效果:在单一分类器模型中,MLP 算法在预处理后的心脏病数据集上表现最佳。经过特征选择后,有五个特征被剔除。结合基础分类器的 ENSEM_SV 算法通过对分类器结果进行软投票来确定预测结果,在准确率、Jaccard_Score、Hamm_Loss、AUC 等五个指标上都达到了最优值,AUC 值达到了 0.951。在由基础分类器组成的第一阶段子模型中,采用了 RF、ET、GBDT 和 LGB 算法。选择 AB 算法作为第二阶段模型,两阶段堆叠融合得到的集合算法 ENSEM_ST 在准确度、灵敏度、F1_Score、Mathew_Corrcoef 等 7 项指标上表现最佳,AUC 达到 0.952。此外,还对基于不同训练集占用率的算法分类效果进行了比较。结果表明,两种融合模型的预测性能均优于单一模型,且ENSEM_ST融合的整体效果强于ENSEM_SV融合:结论:本研究建立的融合模型在很大程度上提高了模型的整体分类准确性和稳定性。结论:本研究建立的融合模型在很大程度上提高了模型的整体分类准确性和稳定性,在心血管疾病诊断的预测分析中具有很好的应用价值,可为疾病诊断和干预策略提供有价值的参考。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Comparative analysis of supervised learning algorithms for prediction of cardiovascular diseases.

Background: With the advent of artificial intelligence technology, machine learning algorithms have been widely used in the area of disease prediction.

Objective: Cardiovascular disease (CVD) seriously jeopardizes human health worldwide, thereby needing the establishment of an effective CVD prediction model that can be of great significance for controlling the risk of the disease and safeguarding the physical and mental health of the population.

Methods: Considering the UCI heart disease dataset as an example, initially, a single machine learning prediction model was constructed. Subsequently, six methods such as Pearson, chi-squared, RFE and LightGBM were comprehensively used for the feature screening. On the basis of the base classifiers, Soft Voting fusion and Stacking fusion was carried out to build a prediction model for cardiovascular diseases, in order to realize an early warning and disease intervention for high-risk populations. To address the data imbalance problem, the SMOTE method was adopted to process the data set, and the prediction effect of the model was analyzed using multi-dimensional and multi-indicators.

Results: In the single classifier model, the MLP algorithm performed optimally on the preprocessed heart disease dataset. After feature selection, five features eliminated. The ENSEM_SV algorithm that combines the base classifiers to determine the prediction results by soft voting on the results of the classifiers achieved the optimal value on five metrics such as Accuracy, Jaccard_Score, Hamm_Loss, AUC, etc., and the AUC value reached 0.951. The RF, ET, GBDT, and LGB algorithms were employed in the first stage sub-model composed of base classifiers. The AB algorithm was selected as the second stage model, and the ensemble algorithm ENSEM_ST, obtained by Stacking fusion of the two stages exhibited the best performance on 7 indicators such as Accuracy, Sensitivity, F1_Score, Mathew_Corrcoef, etc., and the AUC reached 0.952. Furthermore, a comparison of the algorithms' classification effects based on different training set occupancy was carried out. The results indicated that the prediction performance of both the fusion models was better than the single models, and the overall effect of ENSEM_ST fusion was stronger than the ENSEM_SV fusion.

Conclusions: The fusion model established in this study improved the overall classification accuracy and stability of the model to a significant extent. It has a good application value in the predictive analysis of CVD diagnosis, and can provide a valuable reference in the disease diagnosis and intervention strategies.

求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
Accounts of Chemical Research
Accounts of Chemical Research 化学-化学综合
CiteScore
31.40
自引率
1.10%
发文量
312
审稿时长
2 months
期刊介绍: Accounts of Chemical Research presents short, concise and critical articles offering easy-to-read overviews of basic research and applications in all areas of chemistry and biochemistry. These short reviews focus on research from the author’s own laboratory and are designed to teach the reader about a research project. In addition, Accounts of Chemical Research publishes commentaries that give an informed opinion on a current research problem. Special Issues online are devoted to a single topic of unusual activity and significance. Accounts of Chemical Research replaces the traditional article abstract with an article "Conspectus." These entries synopsize the research affording the reader a closer look at the content and significance of an article. Through this provision of a more detailed description of the article contents, the Conspectus enhances the article's discoverability by search engines and the exposure for the research.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信