临床病史对糖尿病肾脏并发症机器学习和深度学习模型预测性能的影响

IF 4.9 2区医学 Q1 COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS

Computer methods and programs in biomedicine Pub Date : 2025-05-12 DOI:10.1016/j.cmpb.2025.108812

Davide Dei Cas , Barbara Di Camillo , Gian Paolo Fadini , Giovanni Sparacino , Enrico Longato

{"title":"临床病史对糖尿病肾脏并发症机器学习和深度学习模型预测性能的影响","authors":"Davide Dei Cas , Barbara Di Camillo , Gian Paolo Fadini , Giovanni Sparacino , Enrico Longato","doi":"10.1016/j.cmpb.2025.108812","DOIUrl":null,"url":null,"abstract":"<div><h3>Background and Objective</h3><div>Diabetes is a chronic disease characterised by a high risk of developing diabetic nephropathy. The early identification of individuals at heightened risk of such complications or their exacerbation can be crucial to set a correct course of treatment. However, there are currently no widely accepted predictive tools for this task and, additionally, most of these models rely only on information at a single baseline visit. Considering this, we investigate the potential predictive role of patients’ clinical history over multiple levels of renal disease severity while, at the same time, developing an effective predictive model.</div></div><div><h3>Methods:</h3><div>From the data collected in the DARWIN–Renal (DApagliflozin Real-World evIdeNce-Renal) study, a nationwide multicentre retrospective real-world study, we develop four different types of machine learning models, namely, logistic regression, random forest, Cox proportional hazards regression, and a deep learning model based on recurrent neural network to predict the crossing of 5 clinically relevant glomerular filtration rate thresholds for patients with type 2 diabetes.</div></div><div><h3>Results:</h3><div>The predictive performance of all models is satisfactory for all outcomes, even without the introduction of information referring to past visits, with AUROC and C-index between 0.69 and 0.98 and average precision well above the random model. The introduction of past information results into a clear improvement in performance for all the models, with percentage increases of up to 12% for both AUROC and C-index and 300% for average precision. The usefulness of past information is further corroborated by a feature importance analysis.</div></div><div><h3>Conclusions:</h3><div>Incorporating data from the patients’ clinical history into the predictive models greatly improves their performance, particularly for recurrent neural network where the full sequence of values for dynamic variables is provided compared to synthetic indicators of past history.</div></div>","PeriodicalId":10624,"journal":{"name":"Computer methods and programs in biomedicine","volume":"268 ","pages":"Article 108812"},"PeriodicalIF":4.9000,"publicationDate":"2025-05-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"The impact of clinical history on the predictive performance of machine learning and deep learning models for renal complications of diabetes\",\"authors\":\"Davide Dei Cas , Barbara Di Camillo , Gian Paolo Fadini , Giovanni Sparacino , Enrico Longato\",\"doi\":\"10.1016/j.cmpb.2025.108812\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><h3>Background and Objective</h3><div>Diabetes is a chronic disease characterised by a high risk of developing diabetic nephropathy. The early identification of individuals at heightened risk of such complications or their exacerbation can be crucial to set a correct course of treatment. However, there are currently no widely accepted predictive tools for this task and, additionally, most of these models rely only on information at a single baseline visit. Considering this, we investigate the potential predictive role of patients’ clinical history over multiple levels of renal disease severity while, at the same time, developing an effective predictive model.</div></div><div><h3>Methods:</h3><div>From the data collected in the DARWIN–Renal (DApagliflozin Real-World evIdeNce-Renal) study, a nationwide multicentre retrospective real-world study, we develop four different types of machine learning models, namely, logistic regression, random forest, Cox proportional hazards regression, and a deep learning model based on recurrent neural network to predict the crossing of 5 clinically relevant glomerular filtration rate thresholds for patients with type 2 diabetes.</div></div><div><h3>Results:</h3><div>The predictive performance of all models is satisfactory for all outcomes, even without the introduction of information referring to past visits, with AUROC and C-index between 0.69 and 0.98 and average precision well above the random model. The introduction of past information results into a clear improvement in performance for all the models, with percentage increases of up to 12% for both AUROC and C-index and 300% for average precision. The usefulness of past information is further corroborated by a feature importance analysis.</div></div><div><h3>Conclusions:</h3><div>Incorporating data from the patients’ clinical history into the predictive models greatly improves their performance, particularly for recurrent neural network where the full sequence of values for dynamic variables is provided compared to synthetic indicators of past history.</div></div>\",\"PeriodicalId\":10624,\"journal\":{\"name\":\"Computer methods and programs in biomedicine\",\"volume\":\"268 \",\"pages\":\"Article 108812\"},\"PeriodicalIF\":4.9000,\"publicationDate\":\"2025-05-12\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Computer methods and programs in biomedicine\",\"FirstCategoryId\":\"5\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S0169260725002299\",\"RegionNum\":2,\"RegionCategory\":\"医学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Computer methods and programs in biomedicine","FirstCategoryId":"5","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0169260725002299","RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS","Score":null,"Total":0}

引用次数: 0

摘要

背景与目的糖尿病是一种以发展为糖尿病肾病的高风险为特征的慢性疾病。早期识别这些并发症或其恶化风险较高的个体对于制定正确的治疗方案至关重要。然而，目前还没有被广泛接受的预测工具来完成这项任务，此外，大多数这些模型只依赖于单一基线访问的信息。考虑到这一点，我们研究了患者的临床病史对多种肾脏疾病严重程度的潜在预测作用，同时建立了一个有效的预测模型。方法：根据darin - renal （DApagliflozin Real-World evIdeNce-Renal）研究的数据，我们建立了四种不同类型的机器学习模型，即逻辑回归、随机森林、Cox比例风险回归和基于递归神经网络的深度学习模型，以预测2型糖尿病患者肾小球滤过率的5个临床相关阈值的交叉。结果：在不引入以往就诊信息的情况下，所有模型的预测结果均令人满意，AUROC和C-index在0.69 ~ 0.98之间，平均精度远高于随机模型。引入过去的信息后，所有模型的性能都有了明显的提高，AUROC和C-index的百分比都提高了12%，平均精度提高了300%。通过特征重要性分析进一步证实了过去信息的有用性。结论：将患者的临床病史数据纳入预测模型，大大提高了预测模型的性能，特别是对于递归神经网络，与过去病史的综合指标相比，递归神经网络提供了动态变量的完整序列值。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

The impact of clinical history on the predictive performance of machine learning and deep learning models for renal complications of diabetes

Background and Objective

Diabetes is a chronic disease characterised by a high risk of developing diabetic nephropathy. The early identification of individuals at heightened risk of such complications or their exacerbation can be crucial to set a correct course of treatment. However, there are currently no widely accepted predictive tools for this task and, additionally, most of these models rely only on information at a single baseline visit. Considering this, we investigate the potential predictive role of patients’ clinical history over multiple levels of renal disease severity while, at the same time, developing an effective predictive model.

Methods:

From the data collected in the DARWIN–Renal (DApagliflozin Real-World evIdeNce-Renal) study, a nationwide multicentre retrospective real-world study, we develop four different types of machine learning models, namely, logistic regression, random forest, Cox proportional hazards regression, and a deep learning model based on recurrent neural network to predict the crossing of 5 clinically relevant glomerular filtration rate thresholds for patients with type 2 diabetes.

Results:

The predictive performance of all models is satisfactory for all outcomes, even without the introduction of information referring to past visits, with AUROC and C-index between 0.69 and 0.98 and average precision well above the random model. The introduction of past information results into a clear improvement in performance for all the models, with percentage increases of up to 12% for both AUROC and C-index and 300% for average precision. The usefulness of past information is further corroborated by a feature importance analysis.

Conclusions:

Incorporating data from the patients’ clinical history into the predictive models greatly improves their performance, particularly for recurrent neural network where the full sequence of values for dynamic variables is provided compared to synthetic indicators of past history.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Computer methods and programs in biomedicine 工程技术-工程：生物医学

CiteScore

12.30

自引率

6.60%

发文量

601

审稿时长

135 days

期刊介绍： To encourage the development of formal computing methods, and their application in biomedical research and medical practice, by illustration of fundamental principles in biomedical informatics research; to stimulate basic research into application software design; to report the state of research of biomedical information processing projects; to report new computer methodologies applied in biomedical areas; the eventual distribution of demonstrable software to avoid duplication of effort; to provide a forum for discussion and improvement of existing software; to optimize contact between national organizations and regional user groups by promoting an international exchange of information on formal methods, standards and software in biomedicine. Computer Methods and Programs in Biomedicine covers computing methodology and software systems derived from computing science for implementation in all aspects of biomedical research and medical practice. It is designed to serve: biochemists; biologists; geneticists; immunologists; neuroscientists; pharmacologists; toxicologists; clinicians; epidemiologists; psychiatrists; psychologists; cardiologists; chemists; (radio)physicists; computer scientists; programmers and systems analysts; biomedical, clinical, electrical and other engineers; teachers of medical informatics and users of educational software.