This study aimed to develop predictive models and establish a risk scoring system to identify risk factors associated with survival in uterine cancer patients with type 2 diabetes (T2D) and estimate their survival probabilities.
Data were collected from the Hong Kong Hospital Authority Data Collaboration Laboratory (HADCL) from 2000 to 2020. Cox proportional hazards regression, survival tree, LASSO Cox regression, boosting, and random survival forest (RSF) were utilized to develop predictive models for survival. Key risk factors were identified through Shapley Additive Explanations analysis, whereas the AutoScore-Survival package facilitated the development of a risk scoring system.
This cohort study included 2047 uterine cancer patients with T2D. The average survival time was 100.82 (standard deviation: 72.75) months. The RSF model demonstrated the strongest predictive performance, achieving a time-dependent area under the curve (AUC) of 0.823 and a C-index of 0.90. A risk scoring system was created based on several criteria: age at cancer diagnosis, duration of T2D, creatinine levels, serum potassium level, low-density lipoprotein cholesterol level (LDL-C) level, body mass index (BMI), and triglycerides level. This scoring system classified 31.4% of patients as high-risk, resulting in a 5-year survival probability of 43.5%, about 1.7 times lower than that of the low-risk group.
This study leveraged machine learning to identify key survival predictors and develop a clinically interpretable risk scoring system for uterine cancer patients with T2D. Key predictors, including age at cancer diagnosis, duration of T2D, creatinine levels, serum potassium levels, LDL-C levels, BMI, and triglycerides levels, effectively stratified survival risk. These findings demonstrate the potential of data-driven models to enhance individualized prediction and inform targeted clinical management.