A Novel Diabetes Prediction Model in Big Data Healthcare Systems Using DA-KNN Technique

IF 0.8 Q4 COMPUTER SCIENCE, SOFTWARE ENGINEERING

International Journal of Image and Graphics Pub Date : 2023-11-03 DOI:10.1142/s0219467825500469

N. P. Jayasri, R. Aruna

{"title":"A Novel Diabetes Prediction Model in Big Data Healthcare Systems Using DA-KNN Technique","authors":"N. P. Jayasri, R. Aruna","doi":"10.1142/s0219467825500469","DOIUrl":null,"url":null,"abstract":"In the past decades, there is a wide increase in the number of people affected by diabetes, a chronic illness. Early prediction of diabetes is still a challenging problem as it requires clear and sound datasets for a precise prediction. In this era of ubiquitous information technology, big data helps to collect a large amount of information regarding healthcare systems. Due to explosion in the generation of digital data, selecting appropriate data for analysis still remains a complex task. Moreover, missing values and insignificantly labeled data restrict the prediction accuracy. In this context, with the aim of improving the quality of the dataset, missing values are effectively handled by three major phases such as (1) pre-processing, (2) feature extraction, and (3) classification. Pre-processing involves outlier rejection and filling missing values. Feature extraction is done by a principal component analysis (PCA) and finally, the precise prediction of diabetes is accomplished by implementing an effective distance adaptive-KNN (DA-KNN) classifier. The experiments were conducted using Pima Indian Diabetes (PID) dataset and the performance of the proposed model was compared with the state-of-the-art models. The analysis after implementation shows that the proposed model outperforms the conventional models such as NB, SVM, KNN, and RF in terms of accuracy and ROC.","PeriodicalId":44688,"journal":{"name":"International Journal of Image and Graphics","volume":"28 8","pages":"0"},"PeriodicalIF":0.8000,"publicationDate":"2023-11-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"International Journal of Image and Graphics","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1142/s0219467825500469","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q4","JCRName":"COMPUTER SCIENCE, SOFTWARE ENGINEERING","Score":null,"Total":0}

引用次数: 0

Abstract

In the past decades, there is a wide increase in the number of people affected by diabetes, a chronic illness. Early prediction of diabetes is still a challenging problem as it requires clear and sound datasets for a precise prediction. In this era of ubiquitous information technology, big data helps to collect a large amount of information regarding healthcare systems. Due to explosion in the generation of digital data, selecting appropriate data for analysis still remains a complex task. Moreover, missing values and insignificantly labeled data restrict the prediction accuracy. In this context, with the aim of improving the quality of the dataset, missing values are effectively handled by three major phases such as (1) pre-processing, (2) feature extraction, and (3) classification. Pre-processing involves outlier rejection and filling missing values. Feature extraction is done by a principal component analysis (PCA) and finally, the precise prediction of diabetes is accomplished by implementing an effective distance adaptive-KNN (DA-KNN) classifier. The experiments were conducted using Pima Indian Diabetes (PID) dataset and the performance of the proposed model was compared with the state-of-the-art models. The analysis after implementation shows that the proposed model outperforms the conventional models such as NB, SVM, KNN, and RF in terms of accuracy and ROC.

查看原文本刊更多论文

基于DA-KNN技术的大数据医疗系统糖尿病预测模型

在过去的几十年里，受糖尿病(一种慢性疾病)影响的人数大幅增加。糖尿病的早期预测仍然是一个具有挑战性的问题，因为它需要清晰可靠的数据集才能进行准确的预测。在这个信息技术无处不在的时代，大数据有助于收集大量关于医疗保健系统的信息。由于数字数据的爆炸式增长，选择合适的数据进行分析仍然是一项复杂的任务。此外，缺失值和标记不显著的数据限制了预测的准确性。在这种情况下，为了提高数据集的质量，缺失值通过三个主要阶段(1)预处理、(2)特征提取和(3)分类进行有效处理。预处理包括排除异常值和填充缺失值。通过主成分分析(PCA)进行特征提取，最后通过实现有效的距离自适应knn (DA-KNN)分类器实现对糖尿病的精确预测。实验使用皮马印第安糖尿病(PID)数据集进行，并将所提出模型的性能与最先进的模型进行了比较。实现后的分析表明，该模型在准确率和ROC方面都优于NB、SVM、KNN和RF等传统模型。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

International Journal of Image and Graphics COMPUTER SCIENCE, SOFTWARE ENGINEERING-

CiteScore

2.40

自引率

18.80%

发文量