RFFE - Random Forest Fuzzy Entropy for the classification of Diabetes Mellitus.

IF 3.1 Q2 HEALTH CARE SCIENCES & SERVICES

AIMS Public Health Pub Date : 2023-01-01 DOI:10.3934/publichealth.2023030

A Usha Ruby, J George Chellin Chandran, T J Swasthika Jain, B N Chaithanya, Renuka Patil

{"title":"RFFE - Random Forest Fuzzy Entropy for the classification of Diabetes Mellitus.","authors":"A Usha Ruby, J George Chellin Chandran, T J Swasthika Jain, B N Chaithanya, Renuka Patil","doi":"10.3934/publichealth.2023030","DOIUrl":null,"url":null,"abstract":"<p><p>Diabetes is a category of metabolic disease commonly known as a chronic illness. It causes the body to generate less insulin and raises blood sugar levels, leading to various issues and disrupting the functioning of organs, including the retinal, kidney and nerves. To prevent this, people with chronic illnesses require lifetime access to treatment. As a result, early diabetes detection is essential and might save many lives. Diagnosis of people at high risk of developing diabetes is utilized for preventing the disease in various aspects. This article presents a chronic illness prediction prototype based on a person's risk feature data to provide an early prediction for diabetes with Fuzzy Entropy random vectors that regulate the development of each tree in the Random Forest. The proposed prototype consists of data imputation, data sampling, feature selection, and various techniques to predict the disease, such as Fuzzy Entropy, Synthetic Minority Oversampling Technique (SMOTE), Convolutional Neural Network (CNN) with Stochastic Gradient Descent with Momentum (SGDM), Support Vector Machines (SVM), Classification and Regression Tree (CART), K-Nearest Neighbor (KNN), and Naïve Bayes (NB). This study uses the existing Pima Indian Diabetes (PID) dataset for diabetic disease prediction. The predictions' true/false positive/negative rate is investigated using the confusion matrix and the receiver operating characteristic area under the curve (ROCAUC). Findings on a PID dataset are compared with machine learning algorithms revealing that the proposed Random Forest Fuzzy Entropy (RFFE) is a valuable approach for diabetes prediction, with an accuracy of 98 percent.</p>","PeriodicalId":45684,"journal":{"name":"AIMS Public Health","volume":"10 2","pages":"422-442"},"PeriodicalIF":3.1000,"publicationDate":"2023-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10251052/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"AIMS Public Health","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.3934/publichealth.2023030","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"HEALTH CARE SCIENCES & SERVICES","Score":null,"Total":0}

引用次数: 0

Abstract

Diabetes is a category of metabolic disease commonly known as a chronic illness. It causes the body to generate less insulin and raises blood sugar levels, leading to various issues and disrupting the functioning of organs, including the retinal, kidney and nerves. To prevent this, people with chronic illnesses require lifetime access to treatment. As a result, early diabetes detection is essential and might save many lives. Diagnosis of people at high risk of developing diabetes is utilized for preventing the disease in various aspects. This article presents a chronic illness prediction prototype based on a person's risk feature data to provide an early prediction for diabetes with Fuzzy Entropy random vectors that regulate the development of each tree in the Random Forest. The proposed prototype consists of data imputation, data sampling, feature selection, and various techniques to predict the disease, such as Fuzzy Entropy, Synthetic Minority Oversampling Technique (SMOTE), Convolutional Neural Network (CNN) with Stochastic Gradient Descent with Momentum (SGDM), Support Vector Machines (SVM), Classification and Regression Tree (CART), K-Nearest Neighbor (KNN), and Naïve Bayes (NB). This study uses the existing Pima Indian Diabetes (PID) dataset for diabetic disease prediction. The predictions' true/false positive/negative rate is investigated using the confusion matrix and the receiver operating characteristic area under the curve (ROCAUC). Findings on a PID dataset are compared with machine learning algorithms revealing that the proposed Random Forest Fuzzy Entropy (RFFE) is a valuable approach for diabetes prediction, with an accuracy of 98 percent.

Abstract Image

查看原文本刊更多论文

随机森林模糊熵用于糖尿病的分类。

糖尿病是一种通常被称为慢性疾病的代谢性疾病。它会导致身体产生更少的胰岛素，提高血糖水平，导致各种问题，破坏器官的功能，包括视网膜、肾脏和神经。为了预防这种情况，慢性病患者需要终生获得治疗。因此，糖尿病的早期检测至关重要，可能会挽救许多生命。对糖尿病高危人群的诊断可用于从各个方面预防该疾病。本文提出了一种基于个体风险特征数据的慢性疾病预测原型，利用模糊熵随机向量来调节随机森林中每棵树的发展，为糖尿病提供早期预测。所提出的原型包括数据输入、数据采样、特征选择和各种疾病预测技术，如模糊熵、合成少数过采样技术(SMOTE)、带有随机动量梯度下降(SGDM)的卷积神经网络(CNN)、支持向量机(SVM)、分类与回归树(CART)、k -近邻(KNN)和Naïve贝叶斯(NB)。本研究使用现有的皮马印第安糖尿病(PID)数据集进行糖尿病疾病预测。使用混淆矩阵和曲线下接收者工作特征面积(ROCAUC)来研究预测的真/假阳性/阴性率。将PID数据集的研究结果与机器学习算法进行比较，发现所提出的随机森林模糊熵(RFFE)是一种有价值的糖尿病预测方法，准确率为98%。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊