糖尿病II型疾病的预测数据挖掘方法

S. Ibrahim, S. Khairi
{"title":"糖尿病II型疾病的预测数据挖掘方法","authors":"S. Ibrahim, S. Khairi","doi":"10.56225/ijgoia.v1i2.22","DOIUrl":null,"url":null,"abstract":"Diabetes is among the major public health problem especially in developing countries which cause by abnormal insulin secretion in human body. It is a common disease that can led to several health complications and mortality. In Malaysia, most of the cases are categorized as Diabetes Mellitus (DM) Type II. Patients with diabetes increases from year to year due to unhealthy lifestyles e.g. smoking, overweight and hypertension. Therefore, this study meant to identify the influential factors that may contribute to DM Type II by comparing the performance of different data mining approaches. Between April 2017 and November 2018, 684 patients from a public clinic participated in this retrospective cross-sectional study. Four predictive models involved in the study are Logistic Regression, Decision Tree, Naïve Bayes, and Artificial Neural Network (ANN). The error measures (Average Squared Error and Misclassification Rate) with ROC Index are used to evaluate the performance of the models. Results show that the performance of Logistic Regression-Stepwise outperformed to other predictive models with classification accurateness of 73% and able to predict positive outcome (Y=1) correctly by 90%. The significant inputs that affect DM Type II prediction (Y=1) are Hypertension and Glycated Hemoglobin (HbA1c) given the Root Mean Squared Error (RMSE) of model is 0.424. The importance of study may be able to contribute in improving the strategies and planning on diabetes diseases in Malaysia.","PeriodicalId":344452,"journal":{"name":"International Journal of Global Optimization and Its Application","volume":"54 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-06-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":"{\"title\":\"Predictive Data Mining Approaches for Diabetes Mellitus Type II Disease\",\"authors\":\"S. Ibrahim, S. Khairi\",\"doi\":\"10.56225/ijgoia.v1i2.22\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Diabetes is among the major public health problem especially in developing countries which cause by abnormal insulin secretion in human body. It is a common disease that can led to several health complications and mortality. In Malaysia, most of the cases are categorized as Diabetes Mellitus (DM) Type II. Patients with diabetes increases from year to year due to unhealthy lifestyles e.g. smoking, overweight and hypertension. Therefore, this study meant to identify the influential factors that may contribute to DM Type II by comparing the performance of different data mining approaches. Between April 2017 and November 2018, 684 patients from a public clinic participated in this retrospective cross-sectional study. Four predictive models involved in the study are Logistic Regression, Decision Tree, Naïve Bayes, and Artificial Neural Network (ANN). The error measures (Average Squared Error and Misclassification Rate) with ROC Index are used to evaluate the performance of the models. Results show that the performance of Logistic Regression-Stepwise outperformed to other predictive models with classification accurateness of 73% and able to predict positive outcome (Y=1) correctly by 90%. The significant inputs that affect DM Type II prediction (Y=1) are Hypertension and Glycated Hemoglobin (HbA1c) given the Root Mean Squared Error (RMSE) of model is 0.424. The importance of study may be able to contribute in improving the strategies and planning on diabetes diseases in Malaysia.\",\"PeriodicalId\":344452,\"journal\":{\"name\":\"International Journal of Global Optimization and Its Application\",\"volume\":\"54 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2022-06-30\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"2\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"International Journal of Global Optimization and Its Application\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.56225/ijgoia.v1i2.22\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"International Journal of Global Optimization and Its Application","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.56225/ijgoia.v1i2.22","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 2

摘要

糖尿病是由人体内胰岛素分泌异常引起的重大公共卫生问题之一,在发展中国家尤为突出。这是一种常见的疾病,可导致多种健康并发症和死亡。在马来西亚,大多数病例被归类为糖尿病(DM) II型。由于吸烟、超重和高血压等不健康的生活方式,糖尿病患者逐年增加。因此,本研究旨在通过比较不同数据挖掘方法的性能来确定可能导致糖尿病II型的影响因素。在2017年4月至2018年11月期间,来自一家公立诊所的684名患者参加了这项回顾性横断面研究。四种预测模型涉及的研究是逻辑回归,决策树,Naïve贝叶斯和人工神经网络(ANN)。使用误差度量(均方误差和误分类率)和ROC指数来评估模型的性能。结果表明,Logistic Regression-Stepwise的分类准确率为73%,预测阳性结果(Y=1)的准确率为90%,优于其他预测模型。考虑到模型的均方根误差(RMSE)为0.424,影响II型糖尿病预测的重要输入(Y=1)是高血压和糖化血红蛋白(HbA1c)。研究的重要性可能有助于改善马来西亚的糖尿病战略和规划。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Predictive Data Mining Approaches for Diabetes Mellitus Type II Disease
Diabetes is among the major public health problem especially in developing countries which cause by abnormal insulin secretion in human body. It is a common disease that can led to several health complications and mortality. In Malaysia, most of the cases are categorized as Diabetes Mellitus (DM) Type II. Patients with diabetes increases from year to year due to unhealthy lifestyles e.g. smoking, overweight and hypertension. Therefore, this study meant to identify the influential factors that may contribute to DM Type II by comparing the performance of different data mining approaches. Between April 2017 and November 2018, 684 patients from a public clinic participated in this retrospective cross-sectional study. Four predictive models involved in the study are Logistic Regression, Decision Tree, Naïve Bayes, and Artificial Neural Network (ANN). The error measures (Average Squared Error and Misclassification Rate) with ROC Index are used to evaluate the performance of the models. Results show that the performance of Logistic Regression-Stepwise outperformed to other predictive models with classification accurateness of 73% and able to predict positive outcome (Y=1) correctly by 90%. The significant inputs that affect DM Type II prediction (Y=1) are Hypertension and Glycated Hemoglobin (HbA1c) given the Root Mean Squared Error (RMSE) of model is 0.424. The importance of study may be able to contribute in improving the strategies and planning on diabetes diseases in Malaysia.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信