用过采样和特征增强的机器学习分类器预测糖尿病疾病

B. Ahamed, Meenakshi S. Arya, V. O. Nancy
{"title":"用过采样和特征增强的机器学习分类器预测糖尿病疾病","authors":"B. Ahamed, Meenakshi S. Arya, V. O. Nancy","doi":"10.2139/ssrn.4060653","DOIUrl":null,"url":null,"abstract":"The technical improvements in healthcare sector today have given rise to many new inventions in the field of artificial intelligence. Patterns for disease identification are carried out, and the onset of prediction of many diseases is detected. Diseases include diabetes mellitus disease, fatal heart diseases, and symptomatic cancer. There are many algorithms that have played a critical role in the prediction of diseases. This paper proposes an ML based approach for diabetes mellitus disease prediction. For diabetes prediction, many ML algorithms are compared and used in the proposed work, and finally the three ML classifiers providing the highest accuracy are determined: RF, GBM, and LGBM. The accuracy of prediction is obtained using two types of datasets. They are Pima Indians dataset and a curated dataset. The ML classifiers LGBM, GB, and RF are used to build a predictive model, and the accuracy of each classifier is noted and compared. In addition to the generalized prediction mechanism, the data augmentation technique is also used, and the final accuracy of prediction is obtained for the classifiers LGBM, GB, and RF. A comparative study and demonstration between augmentation and non-augmentation are also discussed for the two datasets used in order to further improve the performance accuracy for predicting diabetes disease.","PeriodicalId":192934,"journal":{"name":"Adv. Hum. Comput. Interact.","volume":"2022 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-09-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":"{\"title\":\"Diabetes Mellitus Disease Prediction Using Machine Learning Classifiers with Oversampling and Feature Augmentation\",\"authors\":\"B. Ahamed, Meenakshi S. Arya, V. O. Nancy\",\"doi\":\"10.2139/ssrn.4060653\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The technical improvements in healthcare sector today have given rise to many new inventions in the field of artificial intelligence. Patterns for disease identification are carried out, and the onset of prediction of many diseases is detected. Diseases include diabetes mellitus disease, fatal heart diseases, and symptomatic cancer. There are many algorithms that have played a critical role in the prediction of diseases. This paper proposes an ML based approach for diabetes mellitus disease prediction. For diabetes prediction, many ML algorithms are compared and used in the proposed work, and finally the three ML classifiers providing the highest accuracy are determined: RF, GBM, and LGBM. The accuracy of prediction is obtained using two types of datasets. They are Pima Indians dataset and a curated dataset. The ML classifiers LGBM, GB, and RF are used to build a predictive model, and the accuracy of each classifier is noted and compared. In addition to the generalized prediction mechanism, the data augmentation technique is also used, and the final accuracy of prediction is obtained for the classifiers LGBM, GB, and RF. A comparative study and demonstration between augmentation and non-augmentation are also discussed for the two datasets used in order to further improve the performance accuracy for predicting diabetes disease.\",\"PeriodicalId\":192934,\"journal\":{\"name\":\"Adv. Hum. Comput. Interact.\",\"volume\":\"2022 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2022-09-19\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"2\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Adv. Hum. Comput. Interact.\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.2139/ssrn.4060653\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Adv. Hum. Comput. Interact.","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.2139/ssrn.4060653","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 2

摘要

今天医疗保健领域的技术进步已经在人工智能领域产生了许多新的发明。开展了疾病识别模式,并发现了许多疾病的发病预测。疾病包括糖尿病、致命的心脏病和有症状的癌症。有许多算法在疾病预测中发挥了关键作用。提出了一种基于机器学习的糖尿病疾病预测方法。对于糖尿病预测,我们比较并使用了许多ML算法,最后确定了具有最高准确率的三种ML分类器:RF、GBM和LGBM。使用两种类型的数据集来获得预测的准确性。它们是皮马印第安人数据集和一个策划数据集。使用ML分类器LGBM、GB和RF构建预测模型,并对每个分类器的准确率进行记录和比较。除了广义预测机制外,还采用了数据增强技术,对LGBM、GB和RF分类器进行了最终的预测精度。为了进一步提高预测糖尿病疾病的性能准确性,本文还对两种数据集进行了增强和非增强的对比研究和论证。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Diabetes Mellitus Disease Prediction Using Machine Learning Classifiers with Oversampling and Feature Augmentation
The technical improvements in healthcare sector today have given rise to many new inventions in the field of artificial intelligence. Patterns for disease identification are carried out, and the onset of prediction of many diseases is detected. Diseases include diabetes mellitus disease, fatal heart diseases, and symptomatic cancer. There are many algorithms that have played a critical role in the prediction of diseases. This paper proposes an ML based approach for diabetes mellitus disease prediction. For diabetes prediction, many ML algorithms are compared and used in the proposed work, and finally the three ML classifiers providing the highest accuracy are determined: RF, GBM, and LGBM. The accuracy of prediction is obtained using two types of datasets. They are Pima Indians dataset and a curated dataset. The ML classifiers LGBM, GB, and RF are used to build a predictive model, and the accuracy of each classifier is noted and compared. In addition to the generalized prediction mechanism, the data augmentation technique is also used, and the final accuracy of prediction is obtained for the classifiers LGBM, GB, and RF. A comparative study and demonstration between augmentation and non-augmentation are also discussed for the two datasets used in order to further improve the performance accuracy for predicting diabetes disease.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信