{"title":"Performance Evaluation of Predictive Machine Learning Models for Diabetic Disease Using Python","authors":"M. Bhattacharya, D. Datta","doi":"10.1109/GCAT55367.2022.9972220","DOIUrl":null,"url":null,"abstract":"The discovery of knowledge from medical database is always beneficial as well as challenging task for diagnosis. For example, patients having high blood glucose are required to diagnose as they fall within a group of Diabetes mellitus. Prediction of diabetes mellitus is an essential research in the domain of medical industry. With the advent of artificial intelligence and machine learning this type of prediction removes the hurdles faced in data mining used for similar task. In case of data mining, extraction of knowledge from information stored in database takes place and an understandable description of patterns is achieved. A large number of researches have been already taken place to predict diabetes using traditional machine learning algorithm such as artificial neural network, Naïve Bayes theorem, decision tree, etc. However, determination of diabetes with a certain degree of confidence is required from the accuracy or any other performance measures point of view. In this context, this research work presents machine learning models such as decision tree, support vector machine, random forest, k-nearest neighbours and Naïve-Bayes as classifier to classify whether a patient is diabetic or prone to diabetic. Performance measures of these algorithms have been carried out in terms of accuracy score. Dataset for training and testing the algorithms mentioned is retrieved from Pima Indian Database. On the basis of their comparative evaluation, most important feature with respect to identification of diabetic is extracted. A complete python code has been developed for this research work.","PeriodicalId":133597,"journal":{"name":"2022 IEEE 3rd Global Conference for Advancement in Technology (GCAT)","volume":"62 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-10-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 IEEE 3rd Global Conference for Advancement in Technology (GCAT)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/GCAT55367.2022.9972220","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
The discovery of knowledge from medical database is always beneficial as well as challenging task for diagnosis. For example, patients having high blood glucose are required to diagnose as they fall within a group of Diabetes mellitus. Prediction of diabetes mellitus is an essential research in the domain of medical industry. With the advent of artificial intelligence and machine learning this type of prediction removes the hurdles faced in data mining used for similar task. In case of data mining, extraction of knowledge from information stored in database takes place and an understandable description of patterns is achieved. A large number of researches have been already taken place to predict diabetes using traditional machine learning algorithm such as artificial neural network, Naïve Bayes theorem, decision tree, etc. However, determination of diabetes with a certain degree of confidence is required from the accuracy or any other performance measures point of view. In this context, this research work presents machine learning models such as decision tree, support vector machine, random forest, k-nearest neighbours and Naïve-Bayes as classifier to classify whether a patient is diabetic or prone to diabetic. Performance measures of these algorithms have been carried out in terms of accuracy score. Dataset for training and testing the algorithms mentioned is retrieved from Pima Indian Database. On the basis of their comparative evaluation, most important feature with respect to identification of diabetic is extracted. A complete python code has been developed for this research work.