Kyung Ae Lee, Jong Seung Kim, Yu Ji Kim, In Sun Goak, Heung Yong Jin, Seungyong Park, Hyejin Kang, Tae Sun Park
{"title":"A Machine Learning-Based Prediction Model for Diabetic Kidney Disease in Korean Patients with Type 2 Diabetes Mellitus.","authors":"Kyung Ae Lee, Jong Seung Kim, Yu Ji Kim, In Sun Goak, Heung Yong Jin, Seungyong Park, Hyejin Kang, Tae Sun Park","doi":"10.3390/jcm14062065","DOIUrl":null,"url":null,"abstract":"<p><p><b>Background/Objectives:</b> Diabetic kidney disease (DKD) is a major cause of end-stage kidney disease and a leading contributor to morbidity and mortality in patients with type 2 diabetes mellitus (T2DM). However, predictive models for DKD onset in Korean patients with T2DM remain underexplored. This study aimed to develop and validate a machine learning (ML)-based DKD prediction model for this population. <b>Methods:</b> This retrospective study utilized electronic health records from six secondary or tertiary hospitals in Korea. The Jeonbuk National University Hospital cohort was used for model development (ratio training: test data, 8:2), whereas datasets from five other hospitals supported external validation. We employed multiple ML algorithms, including lasso, ridge, and elastic net regression; random forest; XGBoost; support vector machines; and neural networks. The model incorporated demographic variables, comorbidities, medications, and laboratory test results. <b>Results:</b> Among 5120 patients with T2DM, 1361 (26.6%) developed DKD. In the development cohort, XGBoost achieved the highest predictive performance (AUC: 0.8099), followed by random forest and logistic regression models (AUCs: 0.7977-0.8019). External validation confirmed the model's robustness with high AUCs (XGBoost: 0.8113, logistic regression models: 0.8228-0.8271). Key predictive factors included age; baseline estimated glomerular filtration rate; and creatinine, hemoglobin, and hemoglobin A1c levels. <b>Conclusions:</b> Our findings highlight the potential of ML-based approaches in predicting DKD in patients with T2DM. The superior performance of XGBoost and logistic regression models underscores their clinical utility. External validation supports the model's generalizability. This model is a valuable tool for the early DKD risk assessment of Korean patients with T2DM.</p>","PeriodicalId":15533,"journal":{"name":"Journal of Clinical Medicine","volume":"14 6","pages":""},"PeriodicalIF":3.0000,"publicationDate":"2025-03-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11942948/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Clinical Medicine","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.3390/jcm14062065","RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"MEDICINE, GENERAL & INTERNAL","Score":null,"Total":0}
引用次数: 0
Abstract
Background/Objectives: Diabetic kidney disease (DKD) is a major cause of end-stage kidney disease and a leading contributor to morbidity and mortality in patients with type 2 diabetes mellitus (T2DM). However, predictive models for DKD onset in Korean patients with T2DM remain underexplored. This study aimed to develop and validate a machine learning (ML)-based DKD prediction model for this population. Methods: This retrospective study utilized electronic health records from six secondary or tertiary hospitals in Korea. The Jeonbuk National University Hospital cohort was used for model development (ratio training: test data, 8:2), whereas datasets from five other hospitals supported external validation. We employed multiple ML algorithms, including lasso, ridge, and elastic net regression; random forest; XGBoost; support vector machines; and neural networks. The model incorporated demographic variables, comorbidities, medications, and laboratory test results. Results: Among 5120 patients with T2DM, 1361 (26.6%) developed DKD. In the development cohort, XGBoost achieved the highest predictive performance (AUC: 0.8099), followed by random forest and logistic regression models (AUCs: 0.7977-0.8019). External validation confirmed the model's robustness with high AUCs (XGBoost: 0.8113, logistic regression models: 0.8228-0.8271). Key predictive factors included age; baseline estimated glomerular filtration rate; and creatinine, hemoglobin, and hemoglobin A1c levels. Conclusions: Our findings highlight the potential of ML-based approaches in predicting DKD in patients with T2DM. The superior performance of XGBoost and logistic regression models underscores their clinical utility. External validation supports the model's generalizability. This model is a valuable tool for the early DKD risk assessment of Korean patients with T2DM.
期刊介绍:
Journal of Clinical Medicine (ISSN 2077-0383), is an international scientific open access journal, providing a platform for advances in health care/clinical practices, the study of direct observation of patients and general medical research. This multi-disciplinary journal is aimed at a wide audience of medical researchers and healthcare professionals.
Unique features of this journal:
manuscripts regarding original research and ideas will be particularly welcomed.JCM also accepts reviews, communications, and short notes.
There is no limit to publication length: our aim is to encourage scientists to publish their experimental and theoretical results in as much detail as possible.