Predicting and Staging Chronic Kidney Disease using Optimized Random Forest Algorithm

2021 International Conference on Information Systems and Advanced Technologies (ICISAT) Pub Date : 2021-12-27 DOI:10.1109/ICISAT54145.2021.9678441

Sarra Samet, Mohamed Ridda Laouar, Issam Bendib

{"title":"Predicting and Staging Chronic Kidney Disease using Optimized Random Forest Algorithm","authors":"Sarra Samet, Mohamed Ridda Laouar, Issam Bendib","doi":"10.1109/ICISAT54145.2021.9678441","DOIUrl":null,"url":null,"abstract":"The silent killer Chronic Kidney Disease (CKD) in wealthy countries and listed with the leading causes of death in impoverished countries. Because of its rising incidence, CKD is included in the most serious public health problems. It is apparent that early detection of CKD may reduce the severity of damage in maturity. The patient must go to a diagnostic facility and consult with a doctor. This significant issue has been solved with the introduction of machine learning. This study’s main objective is to build a model that can reliably predict a person’s risk of acquiring CKD. Data mining and machine learning techniques have been widely employed for forecasting chronic renal disease, but little research has been done mixing imputation approaches at the pre-processing stage and feature selection strategy so that classification accuracy will be enhanced. The CKD Database, which is used in the experiments and consists of 400 records with 25, is accessible through UCI’s machine learning repository. It does, however, have a large number of missing values, which is why we proposed combining several missing data imputation strategies to solve the problem. The chi-square test was used to select features in this work. A supervised machine learning classification model called Random Forest (RF) is utilized and optimized with gridsearch to diagnose CKD at an early stage. Following a cross-validation procedure with 5 folders, several metrics were utilized to evaluate the model. Our RF had a 99.24% accuracy. The model’s best result is created by considering the 10 best-selected features. When compared to previous studies, our results are among the best for assessment metrics and the ranking accuracy. However, with only fewer features. In practice, some decision assistance for renal illness’ diagnosis, prevention, and prediction are provided by this study.","PeriodicalId":112478,"journal":{"name":"2021 International Conference on Information Systems and Advanced Technologies (ICISAT)","volume":"127 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-12-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 International Conference on Information Systems and Advanced Technologies (ICISAT)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICISAT54145.2021.9678441","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 1

Abstract

The silent killer Chronic Kidney Disease (CKD) in wealthy countries and listed with the leading causes of death in impoverished countries. Because of its rising incidence, CKD is included in the most serious public health problems. It is apparent that early detection of CKD may reduce the severity of damage in maturity. The patient must go to a diagnostic facility and consult with a doctor. This significant issue has been solved with the introduction of machine learning. This study’s main objective is to build a model that can reliably predict a person’s risk of acquiring CKD. Data mining and machine learning techniques have been widely employed for forecasting chronic renal disease, but little research has been done mixing imputation approaches at the pre-processing stage and feature selection strategy so that classification accuracy will be enhanced. The CKD Database, which is used in the experiments and consists of 400 records with 25, is accessible through UCI’s machine learning repository. It does, however, have a large number of missing values, which is why we proposed combining several missing data imputation strategies to solve the problem. The chi-square test was used to select features in this work. A supervised machine learning classification model called Random Forest (RF) is utilized and optimized with gridsearch to diagnose CKD at an early stage. Following a cross-validation procedure with 5 folders, several metrics were utilized to evaluate the model. Our RF had a 99.24% accuracy. The model’s best result is created by considering the 10 best-selected features. When compared to previous studies, our results are among the best for assessment metrics and the ranking accuracy. However, with only fewer features. In practice, some decision assistance for renal illness’ diagnosis, prevention, and prediction are provided by this study.

查看原文本刊更多论文

使用优化随机森林算法预测和分期慢性肾脏疾病

慢性肾脏疾病(CKD)是富裕国家的无声杀手，也是贫困国家的主要死亡原因。由于其发病率不断上升，CKD被列入最严重的公共卫生问题。很明显，早期发现CKD可以降低成熟时损害的严重程度。病人必须到诊断机构咨询医生。随着机器学习的引入，这个重要的问题已经得到了解决。这项研究的主要目的是建立一个模型，可以可靠地预测一个人患慢性肾病的风险。数据挖掘和机器学习技术已被广泛应用于慢性肾脏疾病的预测，但将预处理阶段的归算方法与特征选择策略相结合以提高分类精度的研究很少。在实验中使用的CKD数据库由400条记录组成，其中25条记录可通过UCI的机器学习存储库访问。然而，它确实有大量的缺失值，这就是为什么我们提出结合几种缺失数据插入策略来解决这个问题。本研究采用卡方检验选择特征。利用随机森林(Random Forest, RF)的监督机器学习分类模型，并对其进行网格搜索优化，在早期诊断CKD。在5个文件夹的交叉验证过程之后，使用了几个指标来评估模型。我们的RF准确率为99.24%。该模型的最佳结果是通过考虑10个最佳选择的特征而产生的。与以前的研究相比，我们的结果在评估指标和排名准确性方面名列前茅。然而，只有更少的功能。在实践中，本研究为肾脏疾病的诊断、预防和预测提供了一定的决策辅助。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2021 International Conference on Information Systems and Advanced Technologies (ICISAT)

自引率

0.00%

发文量