This study aims to develop and validate machine learning–based diagnostic and prognostic models to predict the risk of distant lymph node metastases (DLNM) in patients with hepatocellular carcinoma (HCC) and to evaluate the prognosis for this cohort.
Utilizing a retrospective design, this investigation leverages data extracted from the Surveillance, Epidemiology, and End Results (SEER) database, specifically the January 2024 subset, to conduct the analysis.
The study cohort consists of 15,775 patients diagnosed with HCC as identified within the SEER database, spanning 2016 to 2020.
In the construction of the diagnostic model, recursive feature elimination (RFE) is employed for variable selection, incorporating five critical predictors: age, tumor size, radiation therapy, T-stage, and serum alpha-fetoprotein (AFP) levels. These variables are the foundation for a stacking ensemble model, which is further elucidated through Shapley Additive Explanations (SHAP). Conversely, the prognostic model is crafted utilizing stepwise backward regression to select pertinent variables, including chemotherapy, radiation therapy, tumor size, and age. This model culminates in the development of a prognostic nomogram, underpinned by the Cox proportional hazards model.
The outcome of the diagnostic model is the occurrence of DLNM in patients. The outcome of the prognosis model is determined by survival time and survival status.
The integrated model developed based on stacking demonstrates good predictive performance and high interpretative variability and differentiation. The area under the curve (AUC) in the training set is 0.767, while the AUC in the validation set is 0.768. The nomogram, constructed using the Cox model, also demonstrates consistent and strong predictive capabilities. At the same time, we recognized elements that have a substantial impact on DLNM and the prognosis and extensively discussed their significance in the model and clinical practice.
Our study identified key predictive factors for DLNM and elucidated significant prognostic indicators for HCC patients with DLNM. These findings provide clinicians with valuable tools to accurately identify high-risk individuals for DLNM and conduct more precise risk stratification for this patient subgroup, potentially improving management strategies and patient outcomes.