Machine learning for classifying chronic kidney disease and predicting creatinine levels using at-home measurements

medRxiv - Nephrology Pub Date : 2024-03-18 DOI:10.1101/2024.03.15.24304364

Brady Metherall, Anna K. Berryman, Georgia S. Brennan

{"title":"Machine learning for classifying chronic kidney disease and predicting creatinine levels using at-home measurements","authors":"Brady Metherall, Anna K. Berryman, Georgia S. Brennan","doi":"10.1101/2024.03.15.24304364","DOIUrl":null,"url":null,"abstract":"Background: Chronic kidney disease (CKD) is a global health concern with early detection playing a pivotal role in effective management. Machine learning models demonstrate promise in CKD detection, yet the impact on detection and classification using different sets of clinical features remains under-explored.\nMethods: In this study, we focus on CKD classification and creatinine prediction using three sets of features; at-home, monitoring, and laboratory. We employ artificial neural networks (ANNs) and random forests (RFs) on a dataset of 400 patients with 25 input features, which we divide into three feature sets. Using 10-fold cross-validation, we calculate metrics such as accuracy, true positive rate (TPR), true negative rate (TNR), and mean squared error.\nResults: Our results reveal RF achieves superior accuracy (92.5\\%) in at-home CKD classification over ANNs (82.9\\%). ANNs achieve a higher TPR (92.0\\%) but a lower TNR (67.9\\%) compared with RFs (90.0\\% and 95.8\\%, respectively). For monitoring and laboratory features, both methods achieve accuracies exceeding 98\\%. The R2 score for creatinine regression is approximately 0.3 higher with laboratory features than at-home features. Feature importance analysis identifies key clinical variables hemoglobin and blood urea, and key comorbidities hypertension and diabetes mellitus, in agreement with previous studies.\nConclusions: Machine learning models, particularly RFs, exhibit promise in CKD diagnosis and highlight significant features in CKD detection. Moreover, such models may assist in screening a general population using at-home features---potentially increasing early detection of CKD, thus improving patient care and offering hope for a more effective approach to managing this prevalent health condition.","PeriodicalId":501513,"journal":{"name":"medRxiv - Nephrology","volume":"27 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-03-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"medRxiv - Nephrology","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1101/2024.03.15.24304364","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

Background: Chronic kidney disease (CKD) is a global health concern with early detection playing a pivotal role in effective management. Machine learning models demonstrate promise in CKD detection, yet the impact on detection and classification using different sets of clinical features remains under-explored. Methods: In this study, we focus on CKD classification and creatinine prediction using three sets of features; at-home, monitoring, and laboratory. We employ artificial neural networks (ANNs) and random forests (RFs) on a dataset of 400 patients with 25 input features, which we divide into three feature sets. Using 10-fold cross-validation, we calculate metrics such as accuracy, true positive rate (TPR), true negative rate (TNR), and mean squared error. Results: Our results reveal RF achieves superior accuracy (92.5\%) in at-home CKD classification over ANNs (82.9\%). ANNs achieve a higher TPR (92.0\%) but a lower TNR (67.9\%) compared with RFs (90.0\% and 95.8\%, respectively). For monitoring and laboratory features, both methods achieve accuracies exceeding 98\%. The R2 score for creatinine regression is approximately 0.3 higher with laboratory features than at-home features. Feature importance analysis identifies key clinical variables hemoglobin and blood urea, and key comorbidities hypertension and diabetes mellitus, in agreement with previous studies. Conclusions: Machine learning models, particularly RFs, exhibit promise in CKD diagnosis and highlight significant features in CKD detection. Moreover, such models may assist in screening a general population using at-home features---potentially increasing early detection of CKD, thus improving patient care and offering hope for a more effective approach to managing this prevalent health condition.

查看原文本刊更多论文

利用机器学习对慢性肾病进行分类，并利用居家测量结果预测肌酐水平

背景：慢性肾脏病（CKD）是全球关注的健康问题，早期检测在有效管理中发挥着关键作用。机器学习模型在 CKD 检测中大有可为，但使用不同临床特征集对检测和分类的影响仍未得到充分探讨：在本研究中，我们重点研究了使用三组特征进行 CKD 分类和肌酐预测的情况；这三组特征分别是居家特征、监测特征和实验室特征。我们采用人工神经网络（ANN）和随机森林（RF）对 400 名患者的数据集进行分析，这些数据集有 25 个输入特征，我们将其分为三个特征集。通过 10 倍交叉验证，我们计算了准确率、真阳性率（TPR）、真阴性率（TNR）和均方误差等指标：我们的结果表明，在家庭 CKD 分类中，RF 的准确率（92.5%）高于 ANNs（82.9%）。与 RF（分别为 90.0\% 和 95.8\%）相比，ANN 的 TPR（92.0\%）更高，但 TNR（67.9\%）更低。在监测和实验室特征方面，两种方法的准确率都超过了98%。实验室特征的肌酐回归 R2 分数比居家特征高出约 0.3。特征重要性分析确定了关键临床变量血红蛋白和血尿素，以及关键合并症高血压和糖尿病，这与之前的研究结果一致：机器学习模型，尤其是射频模型，在诊断慢性肾脏病方面大有可为，并能突出慢性肾脏病检测中的重要特征。此外，这些模型还可以帮助利用家庭特征对普通人群进行筛查--有可能增加对慢性肾脏病的早期检测，从而改善患者护理，并为更有效地管理这种普遍存在的健康问题带来希望。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

medRxiv - Nephrology

自引率

0.00%

发文量