Performance Evaluation of Predictive Machine Learning Models for Diabetic Disease Using Python

2022 IEEE 3rd Global Conference for Advancement in Technology (GCAT) Pub Date : 2022-10-07 DOI:10.1109/GCAT55367.2022.9972220

M. Bhattacharya, D. Datta

{"title":"Performance Evaluation of Predictive Machine Learning Models for Diabetic Disease Using Python","authors":"M. Bhattacharya, D. Datta","doi":"10.1109/GCAT55367.2022.9972220","DOIUrl":null,"url":null,"abstract":"The discovery of knowledge from medical database is always beneficial as well as challenging task for diagnosis. For example, patients having high blood glucose are required to diagnose as they fall within a group of Diabetes mellitus. Prediction of diabetes mellitus is an essential research in the domain of medical industry. With the advent of artificial intelligence and machine learning this type of prediction removes the hurdles faced in data mining used for similar task. In case of data mining, extraction of knowledge from information stored in database takes place and an understandable description of patterns is achieved. A large number of researches have been already taken place to predict diabetes using traditional machine learning algorithm such as artificial neural network, Naïve Bayes theorem, decision tree, etc. However, determination of diabetes with a certain degree of confidence is required from the accuracy or any other performance measures point of view. In this context, this research work presents machine learning models such as decision tree, support vector machine, random forest, k-nearest neighbours and Naïve-Bayes as classifier to classify whether a patient is diabetic or prone to diabetic. Performance measures of these algorithms have been carried out in terms of accuracy score. Dataset for training and testing the algorithms mentioned is retrieved from Pima Indian Database. On the basis of their comparative evaluation, most important feature with respect to identification of diabetic is extracted. A complete python code has been developed for this research work.","PeriodicalId":133597,"journal":{"name":"2022 IEEE 3rd Global Conference for Advancement in Technology (GCAT)","volume":"62 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-10-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 IEEE 3rd Global Conference for Advancement in Technology (GCAT)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/GCAT55367.2022.9972220","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

The discovery of knowledge from medical database is always beneficial as well as challenging task for diagnosis. For example, patients having high blood glucose are required to diagnose as they fall within a group of Diabetes mellitus. Prediction of diabetes mellitus is an essential research in the domain of medical industry. With the advent of artificial intelligence and machine learning this type of prediction removes the hurdles faced in data mining used for similar task. In case of data mining, extraction of knowledge from information stored in database takes place and an understandable description of patterns is achieved. A large number of researches have been already taken place to predict diabetes using traditional machine learning algorithm such as artificial neural network, Naïve Bayes theorem, decision tree, etc. However, determination of diabetes with a certain degree of confidence is required from the accuracy or any other performance measures point of view. In this context, this research work presents machine learning models such as decision tree, support vector machine, random forest, k-nearest neighbours and Naïve-Bayes as classifier to classify whether a patient is diabetic or prone to diabetic. Performance measures of these algorithms have been carried out in terms of accuracy score. Dataset for training and testing the algorithms mentioned is retrieved from Pima Indian Database. On the basis of their comparative evaluation, most important feature with respect to identification of diabetic is extracted. A complete python code has been developed for this research work.

查看原文本刊更多论文

基于Python的糖尿病疾病预测机器学习模型的性能评估

从医学数据库中发现知识一直是医学诊断的一项有益的工作，也是一项具有挑战性的任务。例如，患有高血糖的患者需要诊断，因为他们属于糖尿病组。糖尿病预测是医学领域的一项重要研究。随着人工智能和机器学习的出现，这种类型的预测消除了用于类似任务的数据挖掘所面临的障碍。在数据挖掘中，从存储在数据库中的信息中提取知识，并实现对模式的可理解描述。利用人工神经网络、Naïve贝叶斯定理、决策树等传统的机器学习算法预测糖尿病已经进行了大量的研究。然而，从准确性或任何其他性能测量的角度来看，确定糖尿病需要一定程度的信心。在此背景下，本研究工作提出了决策树、支持向量机、随机森林、k近邻和Naïve-Bayes等机器学习模型作为分类器，对患者是否患有糖尿病或易患糖尿病进行分类。对这些算法进行了精度评分方面的性能衡量。用于训练和测试上述算法的数据集从皮马印第安人数据库中检索。在对其进行比较评价的基础上，提取出鉴别糖尿病的最重要特征。已经为这项研究工作开发了一个完整的python代码。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2022 IEEE 3rd Global Conference for Advancement in Technology (GCAT)

自引率

0.00%

发文量