Davide Dei Cas , Barbara Di Camillo , Gian Paolo Fadini , Giovanni Sparacino , Enrico Longato
{"title":"临床病史对糖尿病肾脏并发症机器学习和深度学习模型预测性能的影响","authors":"Davide Dei Cas , Barbara Di Camillo , Gian Paolo Fadini , Giovanni Sparacino , Enrico Longato","doi":"10.1016/j.cmpb.2025.108812","DOIUrl":null,"url":null,"abstract":"<div><h3>Background and Objective</h3><div>Diabetes is a chronic disease characterised by a high risk of developing diabetic nephropathy. The early identification of individuals at heightened risk of such complications or their exacerbation can be crucial to set a correct course of treatment. However, there are currently no widely accepted predictive tools for this task and, additionally, most of these models rely only on information at a single baseline visit. Considering this, we investigate the potential predictive role of patients’ clinical history over multiple levels of renal disease severity while, at the same time, developing an effective predictive model.</div></div><div><h3>Methods:</h3><div>From the data collected in the DARWIN–Renal (DApagliflozin Real-World evIdeNce-Renal) study, a nationwide multicentre retrospective real-world study, we develop four different types of machine learning models, namely, logistic regression, random forest, Cox proportional hazards regression, and a deep learning model based on recurrent neural network to predict the crossing of 5 clinically relevant glomerular filtration rate thresholds for patients with type 2 diabetes.</div></div><div><h3>Results:</h3><div>The predictive performance of all models is satisfactory for all outcomes, even without the introduction of information referring to past visits, with AUROC and C-index between 0.69 and 0.98 and average precision well above the random model. The introduction of past information results into a clear improvement in performance for all the models, with percentage increases of up to 12% for both AUROC and C-index and 300% for average precision. The usefulness of past information is further corroborated by a feature importance analysis.</div></div><div><h3>Conclusions:</h3><div>Incorporating data from the patients’ clinical history into the predictive models greatly improves their performance, particularly for recurrent neural network where the full sequence of values for dynamic variables is provided compared to synthetic indicators of past history.</div></div>","PeriodicalId":10624,"journal":{"name":"Computer methods and programs in biomedicine","volume":"268 ","pages":"Article 108812"},"PeriodicalIF":4.9000,"publicationDate":"2025-05-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"The impact of clinical history on the predictive performance of machine learning and deep learning models for renal complications of diabetes\",\"authors\":\"Davide Dei Cas , Barbara Di Camillo , Gian Paolo Fadini , Giovanni Sparacino , Enrico Longato\",\"doi\":\"10.1016/j.cmpb.2025.108812\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><h3>Background and Objective</h3><div>Diabetes is a chronic disease characterised by a high risk of developing diabetic nephropathy. The early identification of individuals at heightened risk of such complications or their exacerbation can be crucial to set a correct course of treatment. However, there are currently no widely accepted predictive tools for this task and, additionally, most of these models rely only on information at a single baseline visit. Considering this, we investigate the potential predictive role of patients’ clinical history over multiple levels of renal disease severity while, at the same time, developing an effective predictive model.</div></div><div><h3>Methods:</h3><div>From the data collected in the DARWIN–Renal (DApagliflozin Real-World evIdeNce-Renal) study, a nationwide multicentre retrospective real-world study, we develop four different types of machine learning models, namely, logistic regression, random forest, Cox proportional hazards regression, and a deep learning model based on recurrent neural network to predict the crossing of 5 clinically relevant glomerular filtration rate thresholds for patients with type 2 diabetes.</div></div><div><h3>Results:</h3><div>The predictive performance of all models is satisfactory for all outcomes, even without the introduction of information referring to past visits, with AUROC and C-index between 0.69 and 0.98 and average precision well above the random model. The introduction of past information results into a clear improvement in performance for all the models, with percentage increases of up to 12% for both AUROC and C-index and 300% for average precision. The usefulness of past information is further corroborated by a feature importance analysis.</div></div><div><h3>Conclusions:</h3><div>Incorporating data from the patients’ clinical history into the predictive models greatly improves their performance, particularly for recurrent neural network where the full sequence of values for dynamic variables is provided compared to synthetic indicators of past history.</div></div>\",\"PeriodicalId\":10624,\"journal\":{\"name\":\"Computer methods and programs in biomedicine\",\"volume\":\"268 \",\"pages\":\"Article 108812\"},\"PeriodicalIF\":4.9000,\"publicationDate\":\"2025-05-12\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Computer methods and programs in biomedicine\",\"FirstCategoryId\":\"5\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S0169260725002299\",\"RegionNum\":2,\"RegionCategory\":\"医学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Computer methods and programs in biomedicine","FirstCategoryId":"5","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0169260725002299","RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS","Score":null,"Total":0}
The impact of clinical history on the predictive performance of machine learning and deep learning models for renal complications of diabetes
Background and Objective
Diabetes is a chronic disease characterised by a high risk of developing diabetic nephropathy. The early identification of individuals at heightened risk of such complications or their exacerbation can be crucial to set a correct course of treatment. However, there are currently no widely accepted predictive tools for this task and, additionally, most of these models rely only on information at a single baseline visit. Considering this, we investigate the potential predictive role of patients’ clinical history over multiple levels of renal disease severity while, at the same time, developing an effective predictive model.
Methods:
From the data collected in the DARWIN–Renal (DApagliflozin Real-World evIdeNce-Renal) study, a nationwide multicentre retrospective real-world study, we develop four different types of machine learning models, namely, logistic regression, random forest, Cox proportional hazards regression, and a deep learning model based on recurrent neural network to predict the crossing of 5 clinically relevant glomerular filtration rate thresholds for patients with type 2 diabetes.
Results:
The predictive performance of all models is satisfactory for all outcomes, even without the introduction of information referring to past visits, with AUROC and C-index between 0.69 and 0.98 and average precision well above the random model. The introduction of past information results into a clear improvement in performance for all the models, with percentage increases of up to 12% for both AUROC and C-index and 300% for average precision. The usefulness of past information is further corroborated by a feature importance analysis.
Conclusions:
Incorporating data from the patients’ clinical history into the predictive models greatly improves their performance, particularly for recurrent neural network where the full sequence of values for dynamic variables is provided compared to synthetic indicators of past history.
期刊介绍:
To encourage the development of formal computing methods, and their application in biomedical research and medical practice, by illustration of fundamental principles in biomedical informatics research; to stimulate basic research into application software design; to report the state of research of biomedical information processing projects; to report new computer methodologies applied in biomedical areas; the eventual distribution of demonstrable software to avoid duplication of effort; to provide a forum for discussion and improvement of existing software; to optimize contact between national organizations and regional user groups by promoting an international exchange of information on formal methods, standards and software in biomedicine.
Computer Methods and Programs in Biomedicine covers computing methodology and software systems derived from computing science for implementation in all aspects of biomedical research and medical practice. It is designed to serve: biochemists; biologists; geneticists; immunologists; neuroscientists; pharmacologists; toxicologists; clinicians; epidemiologists; psychiatrists; psychologists; cardiologists; chemists; (radio)physicists; computer scientists; programmers and systems analysts; biomedical, clinical, electrical and other engineers; teachers of medical informatics and users of educational software.