ML-CKDP: Machine learning-based chronic kidney disease prediction with smart web application

Q2 Medicine

Journal of Pathology Informatics Pub Date : 2024-02-22 DOI:10.1016/j.jpi.2024.100371

Rajib Kumar Halder , Mohammed Nasir Uddin , Md. Ashraf Uddin , Sunil Aryal , Sajeeb Saha , Rakib Hossen , Sabbir Ahmed , Mohammad Abu Tareq Rony , Mosammat Farida Akter

{"title":"ML-CKDP: Machine learning-based chronic kidney disease prediction with smart web application","authors":"Rajib Kumar Halder , Mohammed Nasir Uddin , Md. Ashraf Uddin , Sunil Aryal , Sajeeb Saha , Rakib Hossen , Sabbir Ahmed , Mohammad Abu Tareq Rony , Mosammat Farida Akter","doi":"10.1016/j.jpi.2024.100371","DOIUrl":null,"url":null,"abstract":"<div><p>Chronic kidney diseases (CKDs) are a significant public health issue with potential for severe complications such as hypertension, anemia, and renal failure. Timely diagnosis is crucial for effective management. Leveraging machine learning within healthcare offers promising advancements in predictive diagnostics. In this paper, we developed a machine learning-based kidney diseases prediction (ML‐CKDP) model with dual objectives: to enhance dataset preprocessing for CKD classification and to develop a web-based application for CKD prediction. The proposed model involves a comprehensive data preprocessing protocol, converting categorical variables to numerical values, imputing missing data, and normalizing via Min-Max scaling. Feature selection is executed using a variety of techniques including Correlation, Chi-Square, Variance Threshold, Recursive Feature Elimination, Sequential Forward Selection, Lasso Regression, and Ridge Regression to refine the datasets. The model employs seven classifiers: Random Forest (RF), AdaBoost (AdaB), Gradient Boosting (GB), XgBoost (XgB), Naive Bayes (NB), Support Vector Machine (SVM), and Decision Tree (DT), to predict CKDs. The effectiveness of the models is assessed by measuring their accuracy, analyzing confusion matrix statistics, and calculating the Area Under the Curve (AUC) specifically for the classification of positive cases. Random Forest (RF) and AdaBoost (AdaB) achieve a 100% accuracy rate, evident across various validation methods including data splits of 70:30, 80:20, and K-Fold set to 10 and 15. RF and AdaB consistently reach perfect AUC scores of 100% across multiple datasets, under different splitting ratios. Moreover, Naive Bayes (NB) stands out for its efficiency, recording the lowest training and testing times across all datasets and split ratios. Additionally, we present a real-time web-based application to operationalize the model, enhancing accessibility for healthcare practitioners and stakeholders.</p><p>Web app link: <span>https://rajib-research-kedney-diseases-prediction.onrender.com/</span><svg><path></path></svg></p></div>","PeriodicalId":37769,"journal":{"name":"Journal of Pathology Informatics","volume":"15 ","pages":"Article 100371"},"PeriodicalIF":0.0000,"publicationDate":"2024-02-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S2153353924000105/pdfft?md5=ed85d934a771241ed567b9de62993e5f&pid=1-s2.0-S2153353924000105-main.pdf","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Pathology Informatics","FirstCategoryId":"1085","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S2153353924000105","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"Medicine","Score":null,"Total":0}

引用次数: 0

Abstract

Chronic kidney diseases (CKDs) are a significant public health issue with potential for severe complications such as hypertension, anemia, and renal failure. Timely diagnosis is crucial for effective management. Leveraging machine learning within healthcare offers promising advancements in predictive diagnostics. In this paper, we developed a machine learning-based kidney diseases prediction (ML‐CKDP) model with dual objectives: to enhance dataset preprocessing for CKD classification and to develop a web-based application for CKD prediction. The proposed model involves a comprehensive data preprocessing protocol, converting categorical variables to numerical values, imputing missing data, and normalizing via Min-Max scaling. Feature selection is executed using a variety of techniques including Correlation, Chi-Square, Variance Threshold, Recursive Feature Elimination, Sequential Forward Selection, Lasso Regression, and Ridge Regression to refine the datasets. The model employs seven classifiers: Random Forest (RF), AdaBoost (AdaB), Gradient Boosting (GB), XgBoost (XgB), Naive Bayes (NB), Support Vector Machine (SVM), and Decision Tree (DT), to predict CKDs. The effectiveness of the models is assessed by measuring their accuracy, analyzing confusion matrix statistics, and calculating the Area Under the Curve (AUC) specifically for the classification of positive cases. Random Forest (RF) and AdaBoost (AdaB) achieve a 100% accuracy rate, evident across various validation methods including data splits of 70:30, 80:20, and K-Fold set to 10 and 15. RF and AdaB consistently reach perfect AUC scores of 100% across multiple datasets, under different splitting ratios. Moreover, Naive Bayes (NB) stands out for its efficiency, recording the lowest training and testing times across all datasets and split ratios. Additionally, we present a real-time web-based application to operationalize the model, enhancing accessibility for healthcare practitioners and stakeholders.

Web app link: https://rajib-research-kedney-diseases-prediction.onrender.com/

Abstract Image

查看原文本刊更多论文

ML-CKDP：基于机器学习的慢性肾病预测与智能网络应用程序

慢性肾脏病（CKD）是一个重要的公共卫生问题，可能会引发高血压、贫血和肾衰竭等严重并发症。及时诊断对于有效管理至关重要。在医疗保健领域利用机器学习为预测性诊断带来了可喜的进步。在本文中，我们开发了基于机器学习的肾脏疾病预测模型（ML-CKDP），该模型具有双重目标：加强数据集预处理以进行 CKD 分类，以及开发基于网络的 CKD 预测应用程序。所提议的模型包括一个全面的数据预处理协议，将分类变量转换为数值、归因缺失数据并通过最小-最大缩放进行归一化。特征选择采用了多种技术，包括相关性、齐次方差、方差阈值、递归特征消除、序列前向选择、拉索回归和岭回归，以完善数据集。该模型采用了七个分类器：随机森林 (RF)、AdaBoost (AdaB)、梯度提升 (GB)、XgBoost (XgB)、奈夫贝叶斯 (NB)、支持向量机 (SVM) 和决策树 (DT) 用于预测 CKD。通过测量模型的准确性、分析混淆矩阵统计量以及专门计算阳性病例分类的曲线下面积（AUC）来评估模型的有效性。随机森林（RF）和AdaBoost（AdaB）的准确率达到了100%，这在各种验证方法中都很明显，包括数据分割为70:30、80:20以及K-Fold设置为10和15。在不同的分割比例下，RF 和 AdaB 在多个数据集上的 AUC 分数始终保持在 100%。此外，Naive Bayes（NB）的效率也很突出，在所有数据集和拆分比例下，它的训练和测试时间都是最少的。此外，我们还提出了一个基于网络的实时应用程序来操作该模型，从而提高了医疗从业人员和利益相关者的可访问性。网络应用程序链接：https://rajib-research-kedney-diseases-prediction.onrender.com/

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Journal of Pathology Informatics Medicine-Pathology and Forensic Medicine

CiteScore

3.70

自引率

0.00%

发文量

审稿时长

18 weeks

期刊介绍： The Journal of Pathology Informatics (JPI) is an open access peer-reviewed journal dedicated to the advancement of pathology informatics. This is the official journal of the Association for Pathology Informatics (API). The journal aims to publish broadly about pathology informatics and freely disseminate all articles worldwide. This journal is of interest to pathologists, informaticians, academics, researchers, health IT specialists, information officers, IT staff, vendors, and anyone with an interest in informatics. We encourage submissions from anyone with an interest in the field of pathology informatics. We publish all types of papers related to pathology informatics including original research articles, technical notes, reviews, viewpoints, commentaries, editorials, symposia, meeting abstracts, book reviews, and correspondence to the editors. All submissions are subject to rigorous peer review by the well-regarded editorial board and by expert referees in appropriate specialties.