基于Python的糖尿病疾病预测机器学习模型的性能评估

M. Bhattacharya, D. Datta
{"title":"基于Python的糖尿病疾病预测机器学习模型的性能评估","authors":"M. Bhattacharya, D. Datta","doi":"10.1109/GCAT55367.2022.9972220","DOIUrl":null,"url":null,"abstract":"The discovery of knowledge from medical database is always beneficial as well as challenging task for diagnosis. For example, patients having high blood glucose are required to diagnose as they fall within a group of Diabetes mellitus. Prediction of diabetes mellitus is an essential research in the domain of medical industry. With the advent of artificial intelligence and machine learning this type of prediction removes the hurdles faced in data mining used for similar task. In case of data mining, extraction of knowledge from information stored in database takes place and an understandable description of patterns is achieved. A large number of researches have been already taken place to predict diabetes using traditional machine learning algorithm such as artificial neural network, Naïve Bayes theorem, decision tree, etc. However, determination of diabetes with a certain degree of confidence is required from the accuracy or any other performance measures point of view. In this context, this research work presents machine learning models such as decision tree, support vector machine, random forest, k-nearest neighbours and Naïve-Bayes as classifier to classify whether a patient is diabetic or prone to diabetic. Performance measures of these algorithms have been carried out in terms of accuracy score. Dataset for training and testing the algorithms mentioned is retrieved from Pima Indian Database. On the basis of their comparative evaluation, most important feature with respect to identification of diabetic is extracted. A complete python code has been developed for this research work.","PeriodicalId":133597,"journal":{"name":"2022 IEEE 3rd Global Conference for Advancement in Technology (GCAT)","volume":"62 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-10-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Performance Evaluation of Predictive Machine Learning Models for Diabetic Disease Using Python\",\"authors\":\"M. Bhattacharya, D. Datta\",\"doi\":\"10.1109/GCAT55367.2022.9972220\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The discovery of knowledge from medical database is always beneficial as well as challenging task for diagnosis. For example, patients having high blood glucose are required to diagnose as they fall within a group of Diabetes mellitus. Prediction of diabetes mellitus is an essential research in the domain of medical industry. With the advent of artificial intelligence and machine learning this type of prediction removes the hurdles faced in data mining used for similar task. In case of data mining, extraction of knowledge from information stored in database takes place and an understandable description of patterns is achieved. A large number of researches have been already taken place to predict diabetes using traditional machine learning algorithm such as artificial neural network, Naïve Bayes theorem, decision tree, etc. However, determination of diabetes with a certain degree of confidence is required from the accuracy or any other performance measures point of view. In this context, this research work presents machine learning models such as decision tree, support vector machine, random forest, k-nearest neighbours and Naïve-Bayes as classifier to classify whether a patient is diabetic or prone to diabetic. Performance measures of these algorithms have been carried out in terms of accuracy score. Dataset for training and testing the algorithms mentioned is retrieved from Pima Indian Database. On the basis of their comparative evaluation, most important feature with respect to identification of diabetic is extracted. A complete python code has been developed for this research work.\",\"PeriodicalId\":133597,\"journal\":{\"name\":\"2022 IEEE 3rd Global Conference for Advancement in Technology (GCAT)\",\"volume\":\"62 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2022-10-07\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2022 IEEE 3rd Global Conference for Advancement in Technology (GCAT)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/GCAT55367.2022.9972220\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 IEEE 3rd Global Conference for Advancement in Technology (GCAT)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/GCAT55367.2022.9972220","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

摘要

从医学数据库中发现知识一直是医学诊断的一项有益的工作,也是一项具有挑战性的任务。例如,患有高血糖的患者需要诊断,因为他们属于糖尿病组。糖尿病预测是医学领域的一项重要研究。随着人工智能和机器学习的出现,这种类型的预测消除了用于类似任务的数据挖掘所面临的障碍。在数据挖掘中,从存储在数据库中的信息中提取知识,并实现对模式的可理解描述。利用人工神经网络、Naïve贝叶斯定理、决策树等传统的机器学习算法预测糖尿病已经进行了大量的研究。然而,从准确性或任何其他性能测量的角度来看,确定糖尿病需要一定程度的信心。在此背景下,本研究工作提出了决策树、支持向量机、随机森林、k近邻和Naïve-Bayes等机器学习模型作为分类器,对患者是否患有糖尿病或易患糖尿病进行分类。对这些算法进行了精度评分方面的性能衡量。用于训练和测试上述算法的数据集从皮马印第安人数据库中检索。在对其进行比较评价的基础上,提取出鉴别糖尿病的最重要特征。已经为这项研究工作开发了一个完整的python代码。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Performance Evaluation of Predictive Machine Learning Models for Diabetic Disease Using Python
The discovery of knowledge from medical database is always beneficial as well as challenging task for diagnosis. For example, patients having high blood glucose are required to diagnose as they fall within a group of Diabetes mellitus. Prediction of diabetes mellitus is an essential research in the domain of medical industry. With the advent of artificial intelligence and machine learning this type of prediction removes the hurdles faced in data mining used for similar task. In case of data mining, extraction of knowledge from information stored in database takes place and an understandable description of patterns is achieved. A large number of researches have been already taken place to predict diabetes using traditional machine learning algorithm such as artificial neural network, Naïve Bayes theorem, decision tree, etc. However, determination of diabetes with a certain degree of confidence is required from the accuracy or any other performance measures point of view. In this context, this research work presents machine learning models such as decision tree, support vector machine, random forest, k-nearest neighbours and Naïve-Bayes as classifier to classify whether a patient is diabetic or prone to diabetic. Performance measures of these algorithms have been carried out in terms of accuracy score. Dataset for training and testing the algorithms mentioned is retrieved from Pima Indian Database. On the basis of their comparative evaluation, most important feature with respect to identification of diabetic is extracted. A complete python code has been developed for this research work.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信