Neural network-based analysis of clinical and demographic variables for predicting platelet counts and prothrombin time in chronic kidney disease patients

IF 7.5 2区计算机科学 Q1 AUTOMATION & CONTROL SYSTEMS

Engineering Applications of Artificial Intelligence Pub Date : 2025-07-22 DOI:10.1016/j.engappai.2025.111741

Simin Nazari , Amira Abdelrasoul

{"title":"Neural network-based analysis of clinical and demographic variables for predicting platelet counts and prothrombin time in chronic kidney disease patients","authors":"Simin Nazari , Amira Abdelrasoul","doi":"10.1016/j.engappai.2025.111741","DOIUrl":null,"url":null,"abstract":"<div><h3>Objective</h3><div>In chronic kidney disease (CKD), critical health indicators such as Platelet count, Prothrombin Time (PT), and depression play pivotal roles in patient management and outcome prediction. Platelets and PT are essential in assessing the coagulation status, which can be compromised in CKD, while depression significantly impacts the quality of life and treatment adherence. This study employs machine learning (ML) techniques to unravel the complexities of variable interactions in CKD and their collective influence on Platelet counts and PT levels. Specifically, we aim to leverage the capabilities of neural networks to dissect and understand how various patient characteristics—including biological sex, age, and pre-existing medical conditions—affect these crucial blood parameters. By doing so, we seek to provide clinically actionable insights that could support personalized monitoring and more effective risk stratification in CKD management.</div></div><div><h3>Data source and setting</h3><div>The data for this study were derived from an open-source dataset, originally utilized in the study by Li et al. and also published on Kaggle, an open data platform.</div></div><div><h3>Methods</h3><div>Data were sourced from the Medical Information Mart for Intensive Care (MIMIC-III) Database, comprising a cohort of 1177 patients for the analysis. The dataset was randomly split into a training set and a validation set.</div><div>We utilized a sequential neural network (NN) model trained using backpropagation to predict the levels of platelet count, PT, and depression indicators. Performance of the model was evaluated on the validation set, focusing on loss reduction over training iterations and accuracy improvement. Additionally, quantitative metrics such as Mean Squared Error (MSE) and Mean Absolute Error (MAE) were used to assess prediction accuracy.</div></div><div><h3>Results</h3><div>The NN model was trained to predict platelet count and PT levels, which are critical indicators in patients with chronic kidney disease. The model achieved an accuracy of 65.01 % for platelet count prediction (MSE = 0.0098) and 75.85 % for PT prediction (MSE = 0.0091), with the training loss rapidly decreasing to a plateau—indicating stable convergence. For platelet count, regression analysis identified blood acidity (pH) (β = 0.180, p < 0.001) and age (β = −0.162, p < 0.001) as the most significant predictors. In contrast, Shapley Additive Explanations (SHAP)-based feature importance analysis ranked pH and gender as the most impactful features, followed by hyperlipemia and age, highlighting the non-linear interactions captured by the neural network. Similarly, PT prediction was significantly influenced by hyperlipemia (β = 1.4162, p < 0.05) and hypertensive status (β = −1.5667, p < 0.05) in the regression model, and these two variables were also identified by SHAP as the most influential contributors. These findings underscore the complementary strengths of traditional statistical methods and explainable artificial intelligence (AI) techniques like SHAP in capturing both direct effects and complex, non-linear relationships, ultimately improving model interpretability and clinical applicability in predictive healthcare analytics.</div></div><div><h3>Conclusions</h3><div>The Backpropagation NN model utilized in this study represents a robust approach for analyzing and predicting key coagulation indicators in CKD patients. By identifying the most influential clinical variables driving platelet count and PT variation, this framework offers practical value in enhancing patient-specific risk assessment, monitoring strategies, and individualized treatment planning. Despite limitations related to dataset size, the model provides a foundation for clinically meaningful decision-support tools in nephrology.</div></div>","PeriodicalId":50523,"journal":{"name":"Engineering Applications of Artificial Intelligence","volume":"159 ","pages":"Article 111741"},"PeriodicalIF":7.5000,"publicationDate":"2025-07-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Engineering Applications of Artificial Intelligence","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0952197625017439","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"AUTOMATION & CONTROL SYSTEMS","Score":null,"Total":0}

引用次数: 0

Abstract

Objective

In chronic kidney disease (CKD), critical health indicators such as Platelet count, Prothrombin Time (PT), and depression play pivotal roles in patient management and outcome prediction. Platelets and PT are essential in assessing the coagulation status, which can be compromised in CKD, while depression significantly impacts the quality of life and treatment adherence. This study employs machine learning (ML) techniques to unravel the complexities of variable interactions in CKD and their collective influence on Platelet counts and PT levels. Specifically, we aim to leverage the capabilities of neural networks to dissect and understand how various patient characteristics—including biological sex, age, and pre-existing medical conditions—affect these crucial blood parameters. By doing so, we seek to provide clinically actionable insights that could support personalized monitoring and more effective risk stratification in CKD management.

Data source and setting

The data for this study were derived from an open-source dataset, originally utilized in the study by Li et al. and also published on Kaggle, an open data platform.

Methods

Data were sourced from the Medical Information Mart for Intensive Care (MIMIC-III) Database, comprising a cohort of 1177 patients for the analysis. The dataset was randomly split into a training set and a validation set.

We utilized a sequential neural network (NN) model trained using backpropagation to predict the levels of platelet count, PT, and depression indicators. Performance of the model was evaluated on the validation set, focusing on loss reduction over training iterations and accuracy improvement. Additionally, quantitative metrics such as Mean Squared Error (MSE) and Mean Absolute Error (MAE) were used to assess prediction accuracy.

Results

The NN model was trained to predict platelet count and PT levels, which are critical indicators in patients with chronic kidney disease. The model achieved an accuracy of 65.01 % for platelet count prediction (MSE = 0.0098) and 75.85 % for PT prediction (MSE = 0.0091), with the training loss rapidly decreasing to a plateau—indicating stable convergence. For platelet count, regression analysis identified blood acidity (pH) (β = 0.180, p < 0.001) and age (β = −0.162, p < 0.001) as the most significant predictors. In contrast, Shapley Additive Explanations (SHAP)-based feature importance analysis ranked pH and gender as the most impactful features, followed by hyperlipemia and age, highlighting the non-linear interactions captured by the neural network. Similarly, PT prediction was significantly influenced by hyperlipemia (β = 1.4162, p < 0.05) and hypertensive status (β = −1.5667, p < 0.05) in the regression model, and these two variables were also identified by SHAP as the most influential contributors. These findings underscore the complementary strengths of traditional statistical methods and explainable artificial intelligence (AI) techniques like SHAP in capturing both direct effects and complex, non-linear relationships, ultimately improving model interpretability and clinical applicability in predictive healthcare analytics.

Conclusions

The Backpropagation NN model utilized in this study represents a robust approach for analyzing and predicting key coagulation indicators in CKD patients. By identifying the most influential clinical variables driving platelet count and PT variation, this framework offers practical value in enhancing patient-specific risk assessment, monitoring strategies, and individualized treatment planning. Despite limitations related to dataset size, the model provides a foundation for clinically meaningful decision-support tools in nephrology.

Abstract Image

查看原文本刊更多论文

基于神经网络的慢性肾病患者预测血小板计数和凝血酶原时间的临床和人口学变量分析

目的慢性肾脏疾病（CKD）的关键健康指标如血小板计数、凝血酶原时间（PT）和抑郁在患者管理和预后预测中起关键作用。血小板和PT对于评估凝血状态至关重要，而慢性肾病患者凝血状态可能受损，而抑郁症则显著影响生活质量和治疗依从性。本研究采用机器学习（ML）技术来揭示CKD中可变相互作用的复杂性及其对血小板计数和PT水平的集体影响。具体来说，我们的目标是利用神经网络的能力来解剖和理解不同的患者特征——包括生理性别、年龄和先前的医疗状况——如何影响这些关键的血液参数。通过这样做，我们寻求提供临床可操作的见解，以支持个性化监测和更有效的CKD管理风险分层。本研究的数据来源于一个开源数据集，该数据集最初由Li等人在研究中使用，并发表在开放数据平台Kaggle上。方法数据来源于重症监护医学信息市场（MIMIC-III）数据库，包括1177例患者进行队列分析。数据集随机分为训练集和验证集。我们使用反向传播训练的序列神经网络（NN）模型来预测血小板计数、PT和抑郁指标的水平。在验证集上对模型的性能进行了评估，重点是减少训练迭代的损失和提高准确性。此外，定量指标如均方误差（MSE）和平均绝对误差（MAE）被用来评估预测的准确性。结果对神经网络模型进行训练，预测血小板计数和PT水平，这是慢性肾病患者的关键指标。该模型对血小板计数的预测准确率为65.01% (MSE = 0.0098)，对血小板计数的预测准确率为75.85% (MSE = 0.0091)，训练损失迅速下降至平台，表明收敛稳定。对于血小板计数，回归分析确定血液酸度（pH） (β = 0.180, p <；0.001)和年龄(β = - 0.162, p <；0.001)为最显著的预测因子。相比之下，基于Shapley加性解释（SHAP）的特征重要性分析将pH值和性别列为最具影响力的特征，其次是高脂血症和年龄，突出了神经网络捕获的非线性相互作用。同样，高脂血症显著影响PT预测(β = 1.4162, p <；0.05)和高血压状态(β = - 1.5667, p <；在回归模型中，这两个变量也被SHAP识别为影响最大的变量。这些发现强调了传统统计方法和可解释的人工智能（AI）技术（如SHAP）在捕捉直接影响和复杂的非线性关系方面的互补优势，最终提高了预测医疗分析中的模型可解释性和临床适用性。结论本研究中使用的反向传播神经网络模型是一种分析和预测CKD患者关键凝血指标的稳健方法。通过确定驱动血小板计数和PT变化的最具影响力的临床变量，该框架在加强患者特异性风险评估、监测策略和个性化治疗计划方面具有实用价值。尽管受数据集大小的限制，该模型为肾病学中有临床意义的决策支持工具提供了基础。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Engineering Applications of Artificial Intelligence 工程技术-工程：电子与电气

CiteScore

9.60

自引率

10.00%

发文量

505

审稿时长

68 days

期刊介绍： Artificial Intelligence (AI) is pivotal in driving the fourth industrial revolution, witnessing remarkable advancements across various machine learning methodologies. AI techniques have become indispensable tools for practicing engineers, enabling them to tackle previously insurmountable challenges. Engineering Applications of Artificial Intelligence serves as a global platform for the swift dissemination of research elucidating the practical application of AI methods across all engineering disciplines. Submitted papers are expected to present novel aspects of AI utilized in real-world engineering applications, validated using publicly available datasets to ensure the replicability of research outcomes. Join us in exploring the transformative potential of AI in engineering.