预测糖尿病患者心血管疾病的机器学习系统

A. Mayya, H. Solieman
{"title":"预测糖尿病患者心血管疾病的机器学习系统","authors":"A. Mayya, H. Solieman","doi":"10.32603/1993-8985-2022-25-4-116-122","DOIUrl":null,"url":null,"abstract":"Introduction. Patients with diabetes are exposed to various cardiovascular risk factors, which lead to an increased risk of cardiac complications. Therefore, the development of a diagnostic system for diabetes and cardiovascular disease (CVD) is a relevant research task. In addition, the identification of the most significant indicators of both diseases may help physicians improve treatment, speed the diagnosis, and decrease its computational costs.Aim. To classify subjects with different diabetes types, predict the risk of cardiovascular diseases in diabetic patients using machine learning methods by finding the correlational indicators.Materials and methods. The NHANES database was used following preprocessing and balancing its data. Machine learning methods were used to classify diabetes based on physical examination data and laboratory data. Feature selection methods were used to derive the most significant indicators for predicting CVD risk in diabetic patients. Performance optimization of the developed classification and prediction models was carried out based on different evaluation metrics.Results. The developed model (Random Forest) achieved the accuracy of 93.1 % (based on laboratory data) and 88 % (based on pysicical examination plus laboratory data). The top five most common predictors in diabetes and prediabetes were found to be glycohemoglobin, basophil count, triglyceride level, waist size, and body mass index (BMI). These results seem logical, since glycohemoglobin is commonly used to check the amount of glucose (sugar) bound to the hemoglobin in the red blood cells. For CVD patients, the most common predictors inlcude eosinophil count (indicative of blood diseases), gamma-glutamyl transferase (GGT), glycohemoglobin, overall oral health, and hand stiffness.Conclusion. Balancing the dataset and deleting NaN values improved the performance of the developed models. The RFC and XGBoost models achieved higher accuracy using gradient descending order to minimize the loss function. The final prediction is made using a weighted majority vote of all the decisions. The result was an automated system for predicting CVD risk in diabetic patients.","PeriodicalId":217555,"journal":{"name":"Journal of the Russian Universities. Radioelectronics","volume":"26 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-09-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Machine Learning System for Predicting Cardiovascular Disorders in Diabetic Patients\",\"authors\":\"A. Mayya, H. Solieman\",\"doi\":\"10.32603/1993-8985-2022-25-4-116-122\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Introduction. Patients with diabetes are exposed to various cardiovascular risk factors, which lead to an increased risk of cardiac complications. Therefore, the development of a diagnostic system for diabetes and cardiovascular disease (CVD) is a relevant research task. In addition, the identification of the most significant indicators of both diseases may help physicians improve treatment, speed the diagnosis, and decrease its computational costs.Aim. To classify subjects with different diabetes types, predict the risk of cardiovascular diseases in diabetic patients using machine learning methods by finding the correlational indicators.Materials and methods. The NHANES database was used following preprocessing and balancing its data. Machine learning methods were used to classify diabetes based on physical examination data and laboratory data. Feature selection methods were used to derive the most significant indicators for predicting CVD risk in diabetic patients. Performance optimization of the developed classification and prediction models was carried out based on different evaluation metrics.Results. The developed model (Random Forest) achieved the accuracy of 93.1 % (based on laboratory data) and 88 % (based on pysicical examination plus laboratory data). The top five most common predictors in diabetes and prediabetes were found to be glycohemoglobin, basophil count, triglyceride level, waist size, and body mass index (BMI). These results seem logical, since glycohemoglobin is commonly used to check the amount of glucose (sugar) bound to the hemoglobin in the red blood cells. For CVD patients, the most common predictors inlcude eosinophil count (indicative of blood diseases), gamma-glutamyl transferase (GGT), glycohemoglobin, overall oral health, and hand stiffness.Conclusion. Balancing the dataset and deleting NaN values improved the performance of the developed models. The RFC and XGBoost models achieved higher accuracy using gradient descending order to minimize the loss function. The final prediction is made using a weighted majority vote of all the decisions. The result was an automated system for predicting CVD risk in diabetic patients.\",\"PeriodicalId\":217555,\"journal\":{\"name\":\"Journal of the Russian Universities. Radioelectronics\",\"volume\":\"26 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2022-09-29\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Journal of the Russian Universities. Radioelectronics\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.32603/1993-8985-2022-25-4-116-122\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of the Russian Universities. Radioelectronics","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.32603/1993-8985-2022-25-4-116-122","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

摘要

介绍。糖尿病患者暴露于各种心血管危险因素中,导致心脏并发症的风险增加。因此,开发糖尿病与心血管疾病(CVD)的诊断系统是一项相关的研究任务。此外,确定这两种疾病的最重要指标可以帮助医生改进治疗,加快诊断速度,并减少计算成本。对不同糖尿病类型的受试者进行分类,通过寻找相关指标,利用机器学习方法预测糖尿病患者发生心血管疾病的风险。材料和方法。采用NHANES数据库进行数据预处理和平衡。基于体检数据和实验室数据,采用机器学习方法对糖尿病进行分类。采用特征选择方法得出预测糖尿病患者心血管疾病风险的最显著指标。基于不同的评价指标,对所建立的分类和预测模型进行了性能优化。所开发的模型(Random Forest)的准确率为93.1%(基于实验室数据)和88%(基于体检加实验室数据)。糖尿病和前驱糖尿病最常见的五个预测因子是糖蛋白、嗜碱性粒细胞计数、甘油三酯水平、腰围大小和体重指数(BMI)。这些结果似乎是合乎逻辑的,因为糖蛋白通常被用来检查红细胞中与血红蛋白结合的葡萄糖(糖)的数量。对于心血管疾病患者,最常见的预测因子包括嗜酸性粒细胞计数(指示血液疾病)、γ -谷氨酰转移酶(GGT)、糖蛋白、整体口腔健康状况和手僵硬。平衡数据集和删除NaN值提高了开发模型的性能。RFC和XGBoost模型采用梯度降序最小化损失函数实现了更高的精度。最终的预测是使用所有决策的加权多数投票进行的。结果是一个预测糖尿病患者心血管疾病风险的自动化系统。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Machine Learning System for Predicting Cardiovascular Disorders in Diabetic Patients
Introduction. Patients with diabetes are exposed to various cardiovascular risk factors, which lead to an increased risk of cardiac complications. Therefore, the development of a diagnostic system for diabetes and cardiovascular disease (CVD) is a relevant research task. In addition, the identification of the most significant indicators of both diseases may help physicians improve treatment, speed the diagnosis, and decrease its computational costs.Aim. To classify subjects with different diabetes types, predict the risk of cardiovascular diseases in diabetic patients using machine learning methods by finding the correlational indicators.Materials and methods. The NHANES database was used following preprocessing and balancing its data. Machine learning methods were used to classify diabetes based on physical examination data and laboratory data. Feature selection methods were used to derive the most significant indicators for predicting CVD risk in diabetic patients. Performance optimization of the developed classification and prediction models was carried out based on different evaluation metrics.Results. The developed model (Random Forest) achieved the accuracy of 93.1 % (based on laboratory data) and 88 % (based on pysicical examination plus laboratory data). The top five most common predictors in diabetes and prediabetes were found to be glycohemoglobin, basophil count, triglyceride level, waist size, and body mass index (BMI). These results seem logical, since glycohemoglobin is commonly used to check the amount of glucose (sugar) bound to the hemoglobin in the red blood cells. For CVD patients, the most common predictors inlcude eosinophil count (indicative of blood diseases), gamma-glutamyl transferase (GGT), glycohemoglobin, overall oral health, and hand stiffness.Conclusion. Balancing the dataset and deleting NaN values improved the performance of the developed models. The RFC and XGBoost models achieved higher accuracy using gradient descending order to minimize the loss function. The final prediction is made using a weighted majority vote of all the decisions. The result was an automated system for predicting CVD risk in diabetic patients.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信