A. Andreychenko, A. D. Ermak, D. V. Gavrilov, R. Novitskiy, A. V. Gusev
{"title":"开发和验证机器学习模型,预测未来 12 个月内糖尿病患者的意外住院情况","authors":"A. Andreychenko, A. D. Ermak, D. V. Gavrilov, R. Novitskiy, A. V. Gusev","doi":"10.14341/dm13065","DOIUrl":null,"url":null,"abstract":"BACKGROUND: The incidence of diabetes mellitus (DM) both in the Russian Federation and in the world has been steadily increasing for several decades. Stable population growth and current epidemiological characteristics of DM lead to enormous economic costs and significant social losses throughout the world. The disease often progresses with the development of specific complications, while significantly increasing the likelihood of hospitalization. The creation and inference of a machine learning model for predicting hospitalizations of patients with DM to an inpatient medical facility will make it possible to personalize the provision of medical care and optimize the load on the entire healthcare system.AIM: Development and validation of models for predicting unplanned hospitalizations of patients with diabetes due to the disease itself and its complications using machine learning algorithms and data from real clinical practice.MATERIALS AND METHODS: 170,141 depersonalized electronic health records of 23,742 diabetic patients were included in the study. Anamnestic, constitutional, clinical, instrumental and laboratory data, widely used in routine medical practice, were considered as potential predictors, a total of 33 signs. Logistic regression (LR), gradient boosting methods (LightGBM, XGBoost, CatBoost), decision tree-based methods (RandomForest and ExtraTrees), and a neural network-based algorithm (Multi-layer Perceptron) were compared. External validation was performed on the data of the separate region of Russian Federation.RESULTS: The best results and stability to external validation data were shown by the LightGBM model with an AUC of 0.818 (95% CI 0.802–0.834) in internal testing and 0.802 (95% CI 0.773–0.832) in external validation.CONCLUSION: The metrics of the best model were superior to previously published studies. The results of external validation showed the relative stability of the model to new data from another region, that reflects the possibility of the model’s application in real clinical practice.","PeriodicalId":11327,"journal":{"name":"Diabetes Mellitus","volume":null,"pages":null},"PeriodicalIF":0.7000,"publicationDate":"2024-05-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Development and validation of machine learning models to predict unplanned hospitalizations of patients with diabetes within the next 12 months\",\"authors\":\"A. Andreychenko, A. D. Ermak, D. V. Gavrilov, R. Novitskiy, A. V. Gusev\",\"doi\":\"10.14341/dm13065\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"BACKGROUND: The incidence of diabetes mellitus (DM) both in the Russian Federation and in the world has been steadily increasing for several decades. Stable population growth and current epidemiological characteristics of DM lead to enormous economic costs and significant social losses throughout the world. The disease often progresses with the development of specific complications, while significantly increasing the likelihood of hospitalization. The creation and inference of a machine learning model for predicting hospitalizations of patients with DM to an inpatient medical facility will make it possible to personalize the provision of medical care and optimize the load on the entire healthcare system.AIM: Development and validation of models for predicting unplanned hospitalizations of patients with diabetes due to the disease itself and its complications using machine learning algorithms and data from real clinical practice.MATERIALS AND METHODS: 170,141 depersonalized electronic health records of 23,742 diabetic patients were included in the study. Anamnestic, constitutional, clinical, instrumental and laboratory data, widely used in routine medical practice, were considered as potential predictors, a total of 33 signs. Logistic regression (LR), gradient boosting methods (LightGBM, XGBoost, CatBoost), decision tree-based methods (RandomForest and ExtraTrees), and a neural network-based algorithm (Multi-layer Perceptron) were compared. External validation was performed on the data of the separate region of Russian Federation.RESULTS: The best results and stability to external validation data were shown by the LightGBM model with an AUC of 0.818 (95% CI 0.802–0.834) in internal testing and 0.802 (95% CI 0.773–0.832) in external validation.CONCLUSION: The metrics of the best model were superior to previously published studies. The results of external validation showed the relative stability of the model to new data from another region, that reflects the possibility of the model’s application in real clinical practice.\",\"PeriodicalId\":11327,\"journal\":{\"name\":\"Diabetes Mellitus\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":0.7000,\"publicationDate\":\"2024-05-06\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Diabetes Mellitus\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.14341/dm13065\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q4\",\"JCRName\":\"ENDOCRINOLOGY & METABOLISM\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Diabetes Mellitus","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.14341/dm13065","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q4","JCRName":"ENDOCRINOLOGY & METABOLISM","Score":null,"Total":0}
引用次数: 0
摘要
背景:几十年来,糖尿病(DM)在俄罗斯联邦和全世界的发病率都在稳步上升。稳定的人口增长和目前糖尿病的流行病学特征导致全世界巨大的经济损失和重大的社会损失。这种疾病通常会随着特定并发症的出现而发展,同时大大增加了住院治疗的可能性。目的:利用机器学习算法和真实临床实践数据,开发并验证用于预测糖尿病患者因疾病本身及其并发症而意外住院的模型。常规医疗实践中广泛使用的体征、体质、临床、仪器和实验室数据被视为潜在的预测因素,共计 33 种体征。对逻辑回归(LR)、梯度提升方法(LightGBM、XGBoost、CatBoost)、基于决策树的方法(RandomForest 和 ExtraTrees)以及基于神经网络的算法(多层感知器)进行了比较。结果:LightGBM 模型的内部测试 AUC 为 0.818(95% CI 0.802-0.834),外部验证 AUC 为 0.802(95% CI 0.773-0.832)。外部验证结果表明,该模型对来自其他地区的新数据具有相对稳定性,这反映了该模型在实际临床实践中应用的可能性。
Development and validation of machine learning models to predict unplanned hospitalizations of patients with diabetes within the next 12 months
BACKGROUND: The incidence of diabetes mellitus (DM) both in the Russian Federation and in the world has been steadily increasing for several decades. Stable population growth and current epidemiological characteristics of DM lead to enormous economic costs and significant social losses throughout the world. The disease often progresses with the development of specific complications, while significantly increasing the likelihood of hospitalization. The creation and inference of a machine learning model for predicting hospitalizations of patients with DM to an inpatient medical facility will make it possible to personalize the provision of medical care and optimize the load on the entire healthcare system.AIM: Development and validation of models for predicting unplanned hospitalizations of patients with diabetes due to the disease itself and its complications using machine learning algorithms and data from real clinical practice.MATERIALS AND METHODS: 170,141 depersonalized electronic health records of 23,742 diabetic patients were included in the study. Anamnestic, constitutional, clinical, instrumental and laboratory data, widely used in routine medical practice, were considered as potential predictors, a total of 33 signs. Logistic regression (LR), gradient boosting methods (LightGBM, XGBoost, CatBoost), decision tree-based methods (RandomForest and ExtraTrees), and a neural network-based algorithm (Multi-layer Perceptron) were compared. External validation was performed on the data of the separate region of Russian Federation.RESULTS: The best results and stability to external validation data were shown by the LightGBM model with an AUC of 0.818 (95% CI 0.802–0.834) in internal testing and 0.802 (95% CI 0.773–0.832) in external validation.CONCLUSION: The metrics of the best model were superior to previously published studies. The results of external validation showed the relative stability of the model to new data from another region, that reflects the possibility of the model’s application in real clinical practice.