Machine Learning Techniques in Chronic Kidney Diseases: A Comparative Study of Classification Model Performance.

IF 2.4 Q3 BIOCHEMICAL RESEARCH METHODS

Bioinformatics and Biology Insights Pub Date : 2025-07-27 eCollection Date: 2025-01-01 DOI:10.1177/11779322251356563

Nguyen Dong Phuong, Nguyen Trung Tuyen, Vu Thi Thai Linh, Nghi N Nguyen, Thanh Q Nguyen

{"title":"Machine Learning Techniques in Chronic Kidney Diseases: A Comparative Study of Classification Model Performance.","authors":"Nguyen Dong Phuong, Nguyen Trung Tuyen, Vu Thi Thai Linh, Nghi N Nguyen, Thanh Q Nguyen","doi":"10.1177/11779322251356563","DOIUrl":null,"url":null,"abstract":"<p><p>The kidneys are vital organs responsible for filtering and eliminating toxins from the body. Chronic kidney disease (CKD) is becoming increasingly prevalent, affecting not only older adults but also younger populations. To minimize kidney damage for those at risk, an accurate assessment and monitoring of CKD are crucial. Machine learning models can assist physicians in this task by providing fast and accurate detection. As a result, many health care systems have adopted machine learning, especially for disease diagnosis. In this study, we developed a system to support the diagnosis of CKD. The data were collected from the UCL machine learning database, with missing values filled using the \"mean/mode\" and the \"random sampling method.\" After data processing, we applied the polynomial technique to generate additional features, allowing the models to be better generalized. Then, we utilized feature-based stratified splitting with K-means and implemented 6 machine learning algorithms (Random Forest, Support Vector Machine [SVM], Naive Bayes, Logistic Regression, K-Nearest Neighbor [KNN], and XGBoost) to compare their performance based on accuracy. Among them, Random Forest, XGBoost, SVM, and logistic regression achieved the highest accuracy of 100%, followed by Naive Bayes (97%) and KNN (93%).</p>","PeriodicalId":9065,"journal":{"name":"Bioinformatics and Biology Insights","volume":"19 ","pages":"11779322251356563"},"PeriodicalIF":2.4000,"publicationDate":"2025-07-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12304504/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Bioinformatics and Biology Insights","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1177/11779322251356563","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2025/1/1 0:00:00","PubModel":"eCollection","JCR":"Q3","JCRName":"BIOCHEMICAL RESEARCH METHODS","Score":null,"Total":0}

引用次数: 0

Abstract

The kidneys are vital organs responsible for filtering and eliminating toxins from the body. Chronic kidney disease (CKD) is becoming increasingly prevalent, affecting not only older adults but also younger populations. To minimize kidney damage for those at risk, an accurate assessment and monitoring of CKD are crucial. Machine learning models can assist physicians in this task by providing fast and accurate detection. As a result, many health care systems have adopted machine learning, especially for disease diagnosis. In this study, we developed a system to support the diagnosis of CKD. The data were collected from the UCL machine learning database, with missing values filled using the "mean/mode" and the "random sampling method." After data processing, we applied the polynomial technique to generate additional features, allowing the models to be better generalized. Then, we utilized feature-based stratified splitting with K-means and implemented 6 machine learning algorithms (Random Forest, Support Vector Machine [SVM], Naive Bayes, Logistic Regression, K-Nearest Neighbor [KNN], and XGBoost) to compare their performance based on accuracy. Among them, Random Forest, XGBoost, SVM, and logistic regression achieved the highest accuracy of 100%, followed by Naive Bayes (97%) and KNN (93%).

Abstract Image

查看原文本刊更多论文

慢性肾脏疾病的机器学习技术：分类模型性能的比较研究。

肾脏是负责过滤和排除体内毒素的重要器官。慢性肾脏疾病（CKD）正变得越来越普遍，不仅影响老年人，也影响年轻人。为了最大限度地减少肾脏损害的风险，准确的评估和监测CKD是至关重要的。机器学习模型可以通过提供快速准确的检测来帮助医生完成这项任务。因此，许多医疗保健系统已经采用了机器学习，特别是在疾病诊断方面。在这项研究中，我们开发了一个系统来支持CKD的诊断。数据是从伦敦大学学院的机器学习数据库中收集的，缺失的值使用“均值/模式”和“随机抽样方法”填充。在数据处理后，我们应用多项式技术生成附加特征，使模型能够更好地泛化。然后，我们利用基于特征的分层分割和K-means，实现了6种机器学习算法（随机森林，支持向量机[SVM]，朴素贝叶斯，逻辑回归，k -近邻[KNN]和XGBoost），比较了它们基于精度的性能。其中Random Forest、XGBoost、SVM和logistic回归准确率最高，达到100%，其次是朴素贝叶斯（97%）和KNN（93%）。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Bioinformatics and Biology Insights BIOCHEMICAL RESEARCH METHODS-

CiteScore

6.80

自引率

1.70%

发文量

审稿时长

8 weeks

期刊介绍： Bioinformatics and Biology Insights is an open access, peer-reviewed journal that considers articles on bioinformatics methods and their applications which must pertain to biological insights. All papers should be easily amenable to biologists and as such help bridge the gap between theories and applications.