Hyperparameter optimization for cardiovascular disease data-driven prognostic system.

IF 6 4区计算机科学 Q1 Arts and Humanities

Visual Computing for Industry, Biomedicine, and Art Pub Date : 2023-08-01 DOI:10.1186/s42492-023-00143-6

Jayson Saputra, Cindy Lawrencya, Jecky Mitra Saini, Suharjito Suharjito

{"title":"Hyperparameter optimization for cardiovascular disease data-driven prognostic system.","authors":"Jayson Saputra, Cindy Lawrencya, Jecky Mitra Saini, Suharjito Suharjito","doi":"10.1186/s42492-023-00143-6","DOIUrl":null,"url":null,"abstract":"<p><p>Prediction and diagnosis of cardiovascular diseases (CVDs) based, among other things, on medical examinations and patient symptoms are the biggest challenges in medicine. About 17.9 million people die from CVDs annually, accounting for 31% of all deaths worldwide. With a timely prognosis and thorough consideration of the patient's medical history and lifestyle, it is possible to predict CVDs and take preventive measures to eliminate or control this life-threatening disease. In this study, we used various patient datasets from a major hospital in the United States as prognostic factors for CVD. The data was obtained by monitoring a total of 918 patients whose criteria for adults were 28-77 years old. In this study, we present a data mining modeling approach to analyze the performance, classification accuracy and number of clusters on Cardiovascular Disease Prognostic datasets in unsupervised machine learning (ML) using the Orange data mining software. Various techniques are then used to classify the model parameters, such as k-nearest neighbors, support vector machine, random forest, artificial neural network (ANN), naïve bayes, logistic regression, stochastic gradient descent (SGD), and AdaBoost. To determine the number of clusters, various unsupervised ML clustering methods were used, such as k-means, hierarchical, and density-based spatial clustering of applications with noise clustering. The results showed that the best model performance analysis and classification accuracy were SGD and ANN, both of which had a high score of 0.900 on Cardiovascular Disease Prognostic datasets. Based on the results of most clustering methods, such as k-means and hierarchical clustering, Cardiovascular Disease Prognostic datasets can be divided into two clusters. The prognostic accuracy of CVD depends on the accuracy of the proposed model in determining the diagnostic model. The more accurate the model, the better it can predict which patients are at risk for CVD.</p>","PeriodicalId":52384,"journal":{"name":"Visual Computing for Industry, Biomedicine, and Art","volume":"6 1","pages":"16"},"PeriodicalIF":6.0000,"publicationDate":"2023-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10390457/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Visual Computing for Industry, Biomedicine, and Art","FirstCategoryId":"1093","ListUrlMain":"https://doi.org/10.1186/s42492-023-00143-6","RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"Arts and Humanities","Score":null,"Total":0}

引用次数: 0

Abstract

Prediction and diagnosis of cardiovascular diseases (CVDs) based, among other things, on medical examinations and patient symptoms are the biggest challenges in medicine. About 17.9 million people die from CVDs annually, accounting for 31% of all deaths worldwide. With a timely prognosis and thorough consideration of the patient's medical history and lifestyle, it is possible to predict CVDs and take preventive measures to eliminate or control this life-threatening disease. In this study, we used various patient datasets from a major hospital in the United States as prognostic factors for CVD. The data was obtained by monitoring a total of 918 patients whose criteria for adults were 28-77 years old. In this study, we present a data mining modeling approach to analyze the performance, classification accuracy and number of clusters on Cardiovascular Disease Prognostic datasets in unsupervised machine learning (ML) using the Orange data mining software. Various techniques are then used to classify the model parameters, such as k-nearest neighbors, support vector machine, random forest, artificial neural network (ANN), naïve bayes, logistic regression, stochastic gradient descent (SGD), and AdaBoost. To determine the number of clusters, various unsupervised ML clustering methods were used, such as k-means, hierarchical, and density-based spatial clustering of applications with noise clustering. The results showed that the best model performance analysis and classification accuracy were SGD and ANN, both of which had a high score of 0.900 on Cardiovascular Disease Prognostic datasets. Based on the results of most clustering methods, such as k-means and hierarchical clustering, Cardiovascular Disease Prognostic datasets can be divided into two clusters. The prognostic accuracy of CVD depends on the accuracy of the proposed model in determining the diagnostic model. The more accurate the model, the better it can predict which patients are at risk for CVD.

查看原文本刊更多论文

心血管疾病数据驱动预后系统的超参数优化。

除其他外，基于医学检查和患者症状的心血管疾病预测和诊断是医学上最大的挑战。每年约有1790万人死于心血管疾病，占全世界死亡总数的31%。及时的预后和充分考虑患者的病史和生活方式，可以预测心血管疾病并采取预防措施来消除或控制这种危及生命的疾病。在这项研究中，我们使用了来自美国一家主要医院的各种患者数据集作为CVD的预后因素。数据是通过监测918例患者获得的，这些患者的成人标准为28-77岁。在这项研究中，我们提出了一种数据挖掘建模方法，使用Orange数据挖掘软件来分析无监督机器学习(ML)中心血管疾病预后数据集的性能、分类精度和聚类数量。然后使用各种技术对模型参数进行分类，例如k近邻、支持向量机、随机森林、人工神经网络(ANN)、naïve贝叶斯、逻辑回归、随机梯度下降(SGD)和AdaBoost。为了确定聚类的数量，我们使用了各种无监督的ML聚类方法，如k-means、分层聚类和基于密度的空间聚类。结果表明，模型性能分析和分类精度最好的是SGD和ANN，两者在Cardiovascular Disease Prognostic数据集上的得分均为0.900。基于大多数聚类方法的结果，如k-means和分层聚类，心血管疾病预后数据集可以分为两个聚类。CVD的预后准确性取决于所提出的模型在确定诊断模型时的准确性。模型越准确，就越能更好地预测哪些患者有患心血管疾病的风险。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊