用于预测肝细胞癌患者5年总生存率的小样本机器学习模型的开发和验证。

IF 3.4 2区 医学 Q2 ONCOLOGY
Tingting Jiang, Xingyu Liu, Wencan He, Hepei Li, Xiang Yan, Qian Yu, Shanjun Mao
{"title":"用于预测肝细胞癌患者5年总生存率的小样本机器学习模型的开发和验证。","authors":"Tingting Jiang, Xingyu Liu, Wencan He, Hepei Li, Xiang Yan, Qian Yu, Shanjun Mao","doi":"10.1186/s12885-025-14425-0","DOIUrl":null,"url":null,"abstract":"<p><strong>Background: </strong>Early-onset hepatocellular carcinoma (HCC) is insidious, with characteristics of easy metastasis, high recurrence rate, and significant mortality. To address the substantial time and resource demands associated with HCC prognostic prediction, we extract meaningful insights from limited small-sample data to develop and validate a prediction model for HCC 5-year overall survival (OS) by machine learning (ML).</p><p><strong>Methods: </strong>76 newly diagnosed patients with HCC were eventually enrolled between September 2018 and July 2019. The follow-up time was 1-67 months. Patients who survived for 5 years after the first surgery, were divided into a surviving group (n = 34) and a nonsurviving group (n = 42). Pathological data and related survival factors of patients were collected before treatment. The final subset of features was filtered. Prediction models for 5-year OS in patients with HCC were established by logistic regression (LR), support vector machine (SVM), decision tree classification (DTC), random forests (RF), and extreme gradient Boosting (XGBoost), respectively. Additionally, the optimal model was established after rigorous validation. The models were evaluated by values of specificity, F1 score, recall, accuracy and area under the receiver operating characteristic curve (AUC-ROC). The decision curve analysis (DCA) method was assessed the evaluation. Finally, internal and external validations were performed to further validate model' robustness.</p><p><strong>Results: </strong>The significant variable set, which included 22 variables, was screened. Ranking the importance of variables, the top 22 characteristic variables were as follows: the maximum diameter, presence or absence of distant metastasis, CNLC stage, ALB, age, RBC, the large size CTC, total bilirubin (TBIL), PD-L1 (-) CTC, ≥ Pentaploid CTC, AFP, vascular cancer thrombus and satellite nodules, WBC, CTC, BCLC stage, multiple nodules, AST, PD-L1 (-) CTC-WBC cluster, Triploid CTC, LYM, PD-L1 (-) CEC-WBC cluster and degree of cirrhosis. The AUC-ROC values for predicting the 5-year OS rate of HCC patients by the logistic regression, SVM, DTC, RF, and XGBoost models were 0.737, 0.971, 0.657, 0.741, and 0.703, respectively. Among them, the SVM model had the best performance (Accuracy = 0.987, F1 score = 0.988, Recall value = 1.000). The SVM algorithm demonstrated superior performance and stability in the internal and external validations of the model.</p><p><strong>Conclusion: </strong>The SVM model could predict the 5-year OS in HCC with good recognition ability and achieves significantly greater accuracy compared to traditional models. Diagnosis and treatment could be utilized to intervene in the risk factors in this model, thereby improving patient prognosis.</p>","PeriodicalId":9131,"journal":{"name":"BMC Cancer","volume":"25 1","pages":"1040"},"PeriodicalIF":3.4000,"publicationDate":"2025-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12211751/pdf/","citationCount":"0","resultStr":"{\"title\":\"Development and validation of a small-sample machine learning model to predict 5-year overall survival in patients with hepatocellular carcinoma.\",\"authors\":\"Tingting Jiang, Xingyu Liu, Wencan He, Hepei Li, Xiang Yan, Qian Yu, Shanjun Mao\",\"doi\":\"10.1186/s12885-025-14425-0\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p><strong>Background: </strong>Early-onset hepatocellular carcinoma (HCC) is insidious, with characteristics of easy metastasis, high recurrence rate, and significant mortality. To address the substantial time and resource demands associated with HCC prognostic prediction, we extract meaningful insights from limited small-sample data to develop and validate a prediction model for HCC 5-year overall survival (OS) by machine learning (ML).</p><p><strong>Methods: </strong>76 newly diagnosed patients with HCC were eventually enrolled between September 2018 and July 2019. The follow-up time was 1-67 months. Patients who survived for 5 years after the first surgery, were divided into a surviving group (n = 34) and a nonsurviving group (n = 42). Pathological data and related survival factors of patients were collected before treatment. The final subset of features was filtered. Prediction models for 5-year OS in patients with HCC were established by logistic regression (LR), support vector machine (SVM), decision tree classification (DTC), random forests (RF), and extreme gradient Boosting (XGBoost), respectively. Additionally, the optimal model was established after rigorous validation. The models were evaluated by values of specificity, F1 score, recall, accuracy and area under the receiver operating characteristic curve (AUC-ROC). The decision curve analysis (DCA) method was assessed the evaluation. Finally, internal and external validations were performed to further validate model' robustness.</p><p><strong>Results: </strong>The significant variable set, which included 22 variables, was screened. Ranking the importance of variables, the top 22 characteristic variables were as follows: the maximum diameter, presence or absence of distant metastasis, CNLC stage, ALB, age, RBC, the large size CTC, total bilirubin (TBIL), PD-L1 (-) CTC, ≥ Pentaploid CTC, AFP, vascular cancer thrombus and satellite nodules, WBC, CTC, BCLC stage, multiple nodules, AST, PD-L1 (-) CTC-WBC cluster, Triploid CTC, LYM, PD-L1 (-) CEC-WBC cluster and degree of cirrhosis. The AUC-ROC values for predicting the 5-year OS rate of HCC patients by the logistic regression, SVM, DTC, RF, and XGBoost models were 0.737, 0.971, 0.657, 0.741, and 0.703, respectively. Among them, the SVM model had the best performance (Accuracy = 0.987, F1 score = 0.988, Recall value = 1.000). The SVM algorithm demonstrated superior performance and stability in the internal and external validations of the model.</p><p><strong>Conclusion: </strong>The SVM model could predict the 5-year OS in HCC with good recognition ability and achieves significantly greater accuracy compared to traditional models. Diagnosis and treatment could be utilized to intervene in the risk factors in this model, thereby improving patient prognosis.</p>\",\"PeriodicalId\":9131,\"journal\":{\"name\":\"BMC Cancer\",\"volume\":\"25 1\",\"pages\":\"1040\"},\"PeriodicalIF\":3.4000,\"publicationDate\":\"2025-07-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12211751/pdf/\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"BMC Cancer\",\"FirstCategoryId\":\"3\",\"ListUrlMain\":\"https://doi.org/10.1186/s12885-025-14425-0\",\"RegionNum\":2,\"RegionCategory\":\"医学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"ONCOLOGY\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"BMC Cancer","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1186/s12885-025-14425-0","RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"ONCOLOGY","Score":null,"Total":0}
引用次数: 0

摘要

背景:早发性肝细胞癌(HCC)隐匿,易转移,复发率高,病死率高。为了解决与HCC预后预测相关的大量时间和资源需求,我们从有限的小样本数据中提取有意义的见解,通过机器学习(ML)开发和验证HCC 5年总生存期(OS)的预测模型。方法:在2018年9月至2019年7月期间,最终纳入76例新诊断的HCC患者。随访1 ~ 67个月。第一次手术后存活5年的患者分为存活组(n = 34)和非存活组(n = 42)。治疗前收集患者病理资料及相关生存因素。最后的特征子集被过滤。分别采用logistic回归(LR)、支持向量机(SVM)、决策树分类(DTC)、随机森林(RF)和极端梯度增强(XGBoost)方法建立肝癌患者5年OS预测模型。并经过严格的验证,建立了最优模型。通过特异性、F1评分、召回率、准确率和受试者工作特征曲线下面积(AUC-ROC)对模型进行评价。采用决策曲线分析法(DCA)进行评价。最后,进行内部和外部验证,进一步验证模型的稳健性。结果:筛选出22个显著变量集。根据变量的重要性排序,前22个特征变量为:最大直径、有无远处转移、CNLC分期、ALB、年龄、RBC、大尺寸CTC、总胆红素(TBIL)、PD-L1 (-) CTC、≥五倍体CTC、AFP、血管癌血栓及卫星结节、WBC、CTC、BCLC分期、多发结节、AST、PD-L1 (-) CTC-WBC集群、三倍体CTC、LYM、PD-L1 (-) cbc -WBC集群和肝硬化程度。logistic回归、SVM、DTC、RF、XGBoost模型预测HCC患者5年OS率的AUC-ROC值分别为0.737、0.971、0.657、0.741、0.703。其中,SVM模型的准确率为0.987,F1得分为0.988,Recall值为1.000,表现最好。在模型的内部和外部验证中,支持向量机算法表现出优异的性能和稳定性。结论:SVM模型能够预测HCC患者5年OS,识别能力较好,准确率明显高于传统模型。在该模型中,可以通过诊断和治疗干预危险因素,从而改善患者预后。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Development and validation of a small-sample machine learning model to predict 5-year overall survival in patients with hepatocellular carcinoma.

Background: Early-onset hepatocellular carcinoma (HCC) is insidious, with characteristics of easy metastasis, high recurrence rate, and significant mortality. To address the substantial time and resource demands associated with HCC prognostic prediction, we extract meaningful insights from limited small-sample data to develop and validate a prediction model for HCC 5-year overall survival (OS) by machine learning (ML).

Methods: 76 newly diagnosed patients with HCC were eventually enrolled between September 2018 and July 2019. The follow-up time was 1-67 months. Patients who survived for 5 years after the first surgery, were divided into a surviving group (n = 34) and a nonsurviving group (n = 42). Pathological data and related survival factors of patients were collected before treatment. The final subset of features was filtered. Prediction models for 5-year OS in patients with HCC were established by logistic regression (LR), support vector machine (SVM), decision tree classification (DTC), random forests (RF), and extreme gradient Boosting (XGBoost), respectively. Additionally, the optimal model was established after rigorous validation. The models were evaluated by values of specificity, F1 score, recall, accuracy and area under the receiver operating characteristic curve (AUC-ROC). The decision curve analysis (DCA) method was assessed the evaluation. Finally, internal and external validations were performed to further validate model' robustness.

Results: The significant variable set, which included 22 variables, was screened. Ranking the importance of variables, the top 22 characteristic variables were as follows: the maximum diameter, presence or absence of distant metastasis, CNLC stage, ALB, age, RBC, the large size CTC, total bilirubin (TBIL), PD-L1 (-) CTC, ≥ Pentaploid CTC, AFP, vascular cancer thrombus and satellite nodules, WBC, CTC, BCLC stage, multiple nodules, AST, PD-L1 (-) CTC-WBC cluster, Triploid CTC, LYM, PD-L1 (-) CEC-WBC cluster and degree of cirrhosis. The AUC-ROC values for predicting the 5-year OS rate of HCC patients by the logistic regression, SVM, DTC, RF, and XGBoost models were 0.737, 0.971, 0.657, 0.741, and 0.703, respectively. Among them, the SVM model had the best performance (Accuracy = 0.987, F1 score = 0.988, Recall value = 1.000). The SVM algorithm demonstrated superior performance and stability in the internal and external validations of the model.

Conclusion: The SVM model could predict the 5-year OS in HCC with good recognition ability and achieves significantly greater accuracy compared to traditional models. Diagnosis and treatment could be utilized to intervene in the risk factors in this model, thereby improving patient prognosis.

求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
BMC Cancer
BMC Cancer 医学-肿瘤学
CiteScore
6.00
自引率
2.60%
发文量
1204
审稿时长
6.8 months
期刊介绍: BMC Cancer is an open access, peer-reviewed journal that considers articles on all aspects of cancer research, including the pathophysiology, prevention, diagnosis and treatment of cancers. The journal welcomes submissions concerning molecular and cellular biology, genetics, epidemiology, and clinical trials.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信