{"title":"Development and validation of a small-sample machine learning model to predict 5-year overall survival in patients with hepatocellular carcinoma.","authors":"Tingting Jiang, Xingyu Liu, Wencan He, Hepei Li, Xiang Yan, Qian Yu, Shanjun Mao","doi":"10.1186/s12885-025-14425-0","DOIUrl":null,"url":null,"abstract":"<p><strong>Background: </strong>Early-onset hepatocellular carcinoma (HCC) is insidious, with characteristics of easy metastasis, high recurrence rate, and significant mortality. To address the substantial time and resource demands associated with HCC prognostic prediction, we extract meaningful insights from limited small-sample data to develop and validate a prediction model for HCC 5-year overall survival (OS) by machine learning (ML).</p><p><strong>Methods: </strong>76 newly diagnosed patients with HCC were eventually enrolled between September 2018 and July 2019. The follow-up time was 1-67 months. Patients who survived for 5 years after the first surgery, were divided into a surviving group (n = 34) and a nonsurviving group (n = 42). Pathological data and related survival factors of patients were collected before treatment. The final subset of features was filtered. Prediction models for 5-year OS in patients with HCC were established by logistic regression (LR), support vector machine (SVM), decision tree classification (DTC), random forests (RF), and extreme gradient Boosting (XGBoost), respectively. Additionally, the optimal model was established after rigorous validation. The models were evaluated by values of specificity, F1 score, recall, accuracy and area under the receiver operating characteristic curve (AUC-ROC). The decision curve analysis (DCA) method was assessed the evaluation. Finally, internal and external validations were performed to further validate model' robustness.</p><p><strong>Results: </strong>The significant variable set, which included 22 variables, was screened. Ranking the importance of variables, the top 22 characteristic variables were as follows: the maximum diameter, presence or absence of distant metastasis, CNLC stage, ALB, age, RBC, the large size CTC, total bilirubin (TBIL), PD-L1 (-) CTC, ≥ Pentaploid CTC, AFP, vascular cancer thrombus and satellite nodules, WBC, CTC, BCLC stage, multiple nodules, AST, PD-L1 (-) CTC-WBC cluster, Triploid CTC, LYM, PD-L1 (-) CEC-WBC cluster and degree of cirrhosis. The AUC-ROC values for predicting the 5-year OS rate of HCC patients by the logistic regression, SVM, DTC, RF, and XGBoost models were 0.737, 0.971, 0.657, 0.741, and 0.703, respectively. Among them, the SVM model had the best performance (Accuracy = 0.987, F1 score = 0.988, Recall value = 1.000). The SVM algorithm demonstrated superior performance and stability in the internal and external validations of the model.</p><p><strong>Conclusion: </strong>The SVM model could predict the 5-year OS in HCC with good recognition ability and achieves significantly greater accuracy compared to traditional models. Diagnosis and treatment could be utilized to intervene in the risk factors in this model, thereby improving patient prognosis.</p>","PeriodicalId":9131,"journal":{"name":"BMC Cancer","volume":"25 1","pages":"1040"},"PeriodicalIF":3.4000,"publicationDate":"2025-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12211751/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"BMC Cancer","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1186/s12885-025-14425-0","RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"ONCOLOGY","Score":null,"Total":0}
引用次数: 0
Abstract
Background: Early-onset hepatocellular carcinoma (HCC) is insidious, with characteristics of easy metastasis, high recurrence rate, and significant mortality. To address the substantial time and resource demands associated with HCC prognostic prediction, we extract meaningful insights from limited small-sample data to develop and validate a prediction model for HCC 5-year overall survival (OS) by machine learning (ML).
Methods: 76 newly diagnosed patients with HCC were eventually enrolled between September 2018 and July 2019. The follow-up time was 1-67 months. Patients who survived for 5 years after the first surgery, were divided into a surviving group (n = 34) and a nonsurviving group (n = 42). Pathological data and related survival factors of patients were collected before treatment. The final subset of features was filtered. Prediction models for 5-year OS in patients with HCC were established by logistic regression (LR), support vector machine (SVM), decision tree classification (DTC), random forests (RF), and extreme gradient Boosting (XGBoost), respectively. Additionally, the optimal model was established after rigorous validation. The models were evaluated by values of specificity, F1 score, recall, accuracy and area under the receiver operating characteristic curve (AUC-ROC). The decision curve analysis (DCA) method was assessed the evaluation. Finally, internal and external validations were performed to further validate model' robustness.
Results: The significant variable set, which included 22 variables, was screened. Ranking the importance of variables, the top 22 characteristic variables were as follows: the maximum diameter, presence or absence of distant metastasis, CNLC stage, ALB, age, RBC, the large size CTC, total bilirubin (TBIL), PD-L1 (-) CTC, ≥ Pentaploid CTC, AFP, vascular cancer thrombus and satellite nodules, WBC, CTC, BCLC stage, multiple nodules, AST, PD-L1 (-) CTC-WBC cluster, Triploid CTC, LYM, PD-L1 (-) CEC-WBC cluster and degree of cirrhosis. The AUC-ROC values for predicting the 5-year OS rate of HCC patients by the logistic regression, SVM, DTC, RF, and XGBoost models were 0.737, 0.971, 0.657, 0.741, and 0.703, respectively. Among them, the SVM model had the best performance (Accuracy = 0.987, F1 score = 0.988, Recall value = 1.000). The SVM algorithm demonstrated superior performance and stability in the internal and external validations of the model.
Conclusion: The SVM model could predict the 5-year OS in HCC with good recognition ability and achieves significantly greater accuracy compared to traditional models. Diagnosis and treatment could be utilized to intervene in the risk factors in this model, thereby improving patient prognosis.
期刊介绍:
BMC Cancer is an open access, peer-reviewed journal that considers articles on all aspects of cancer research, including the pathophysiology, prevention, diagnosis and treatment of cancers. The journal welcomes submissions concerning molecular and cellular biology, genetics, epidemiology, and clinical trials.