Leveraging SEER data through machine learning to predict distant lymph node metastasis and prognosticate outcomes in hepatocellular carcinoma patients

IF 2.2 4区医学 Q2 BIOTECHNOLOGY & APPLIED MICROBIOLOGY

Journal of Gene Medicine Pub Date : 2024-08-26 DOI:10.1002/jgm.3732

Jiaxuan Sun, Lei Huang, Yahui Liu

{"title":"Leveraging SEER data through machine learning to predict distant lymph node metastasis and prognosticate outcomes in hepatocellular carcinoma patients","authors":"Jiaxuan Sun, Lei Huang, Yahui Liu","doi":"10.1002/jgm.3732","DOIUrl":null,"url":null,"abstract":"<div>\n \n \n <section>\n \n <h3> Objectives</h3>\n \n <p>This study aims to develop and validate machine learning–based diagnostic and prognostic models to predict the risk of distant lymph node metastases (DLNM) in patients with hepatocellular carcinoma (HCC) and to evaluate the prognosis for this cohort.</p>\n </section>\n \n <section>\n \n <h3> Design</h3>\n \n <p>Utilizing a retrospective design, this investigation leverages data extracted from the Surveillance, Epidemiology, and End Results (SEER) database, specifically the January 2024 subset, to conduct the analysis.</p>\n </section>\n \n <section>\n \n <h3> Participants</h3>\n \n <p>The study cohort consists of 15,775 patients diagnosed with HCC as identified within the SEER database, spanning 2016 to 2020.</p>\n </section>\n \n <section>\n \n <h3> Method</h3>\n \n <p>In the construction of the diagnostic model, recursive feature elimination (RFE) is employed for variable selection, incorporating five critical predictors: age, tumor size, radiation therapy, T-stage, and serum alpha-fetoprotein (AFP) levels. These variables are the foundation for a stacking ensemble model, which is further elucidated through Shapley Additive Explanations (SHAP). Conversely, the prognostic model is crafted utilizing stepwise backward regression to select pertinent variables, including chemotherapy, radiation therapy, tumor size, and age. This model culminates in the development of a prognostic nomogram, underpinned by the Cox proportional hazards model.</p>\n </section>\n \n <section>\n \n <h3> Main outcome measures</h3>\n \n <p>The outcome of the diagnostic model is the occurrence of DLNM in patients. The outcome of the prognosis model is determined by survival time and survival status.</p>\n </section>\n \n <section>\n \n <h3> Results</h3>\n \n <p>The integrated model developed based on stacking demonstrates good predictive performance and high interpretative variability and differentiation. The area under the curve (AUC) in the training set is 0.767, while the AUC in the validation set is 0.768. The nomogram, constructed using the Cox model, also demonstrates consistent and strong predictive capabilities. At the same time, we recognized elements that have a substantial impact on DLNM and the prognosis and extensively discussed their significance in the model and clinical practice.</p>\n </section>\n \n <section>\n \n <h3> Conclusion</h3>\n \n <p>Our study identified key predictive factors for DLNM and elucidated significant prognostic indicators for HCC patients with DLNM. These findings provide clinicians with valuable tools to accurately identify high-risk individuals for DLNM and conduct more precise risk stratification for this patient subgroup, potentially improving management strategies and patient outcomes.</p>\n </section>\n </div>","PeriodicalId":56122,"journal":{"name":"Journal of Gene Medicine","volume":"26 9","pages":""},"PeriodicalIF":2.2000,"publicationDate":"2024-08-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Gene Medicine","FirstCategoryId":"3","ListUrlMain":"https://onlinelibrary.wiley.com/doi/10.1002/jgm.3732","RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"BIOTECHNOLOGY & APPLIED MICROBIOLOGY","Score":null,"Total":0}

引用次数: 0

Abstract

Objectives

This study aims to develop and validate machine learning–based diagnostic and prognostic models to predict the risk of distant lymph node metastases (DLNM) in patients with hepatocellular carcinoma (HCC) and to evaluate the prognosis for this cohort.

Design

Utilizing a retrospective design, this investigation leverages data extracted from the Surveillance, Epidemiology, and End Results (SEER) database, specifically the January 2024 subset, to conduct the analysis.

Participants

The study cohort consists of 15,775 patients diagnosed with HCC as identified within the SEER database, spanning 2016 to 2020.

Method

In the construction of the diagnostic model, recursive feature elimination (RFE) is employed for variable selection, incorporating five critical predictors: age, tumor size, radiation therapy, T-stage, and serum alpha-fetoprotein (AFP) levels. These variables are the foundation for a stacking ensemble model, which is further elucidated through Shapley Additive Explanations (SHAP). Conversely, the prognostic model is crafted utilizing stepwise backward regression to select pertinent variables, including chemotherapy, radiation therapy, tumor size, and age. This model culminates in the development of a prognostic nomogram, underpinned by the Cox proportional hazards model.

Main outcome measures

The outcome of the diagnostic model is the occurrence of DLNM in patients. The outcome of the prognosis model is determined by survival time and survival status.

Results

The integrated model developed based on stacking demonstrates good predictive performance and high interpretative variability and differentiation. The area under the curve (AUC) in the training set is 0.767, while the AUC in the validation set is 0.768. The nomogram, constructed using the Cox model, also demonstrates consistent and strong predictive capabilities. At the same time, we recognized elements that have a substantial impact on DLNM and the prognosis and extensively discussed their significance in the model and clinical practice.

Conclusion

Our study identified key predictive factors for DLNM and elucidated significant prognostic indicators for HCC patients with DLNM. These findings provide clinicians with valuable tools to accurately identify high-risk individuals for DLNM and conduct more precise risk stratification for this patient subgroup, potentially improving management strategies and patient outcomes.

查看原文本刊更多论文

通过机器学习利用 SEER 数据预测肝癌患者的远处淋巴结转移和预后。

研究目的本研究旨在开发和验证基于机器学习的诊断和预后模型，以预测肝细胞癌（HCC）患者发生远处淋巴结转移（DLNM）的风险，并评估该人群的预后：本研究采用回顾性设计，利用从监测、流行病学和最终结果（SEER）数据库（特别是 2024 年 1 月的子集）中提取的数据进行分析：研究队列由 SEER 数据库中确定的 15,775 名确诊为 HCC 的患者组成，时间跨度为 2016 年至 2020 年：在构建诊断模型时，采用递归特征消除法（RFE）进行变量选择，其中包含五个关键预测因素：年龄、肿瘤大小、放射治疗、T期和血清甲胎蛋白（AFP）水平。这些变量是堆叠集合模型的基础，该模型通过夏普利相加解释（SHAP）得到进一步阐明。相反，预后模型则是利用逐步回归法来选择相关变量，包括化疗、放疗、肿瘤大小和年龄。该模型的最终结果是建立一个预后提名图，并以 Cox 比例危险模型为基础：诊断模型的结果是患者出现 DLNM。预后模型的结果由生存时间和生存状态决定：结果：基于堆叠法开发的综合模型显示出良好的预测性能、较高的解释变异性和区分度。训练集的曲线下面积（AUC）为 0.767，验证集的 AUC 为 0.768。使用 Cox 模型构建的提名图也显示出一致而强大的预测能力。同时，我们发现了对 DLNM 和预后有重大影响的因素，并广泛讨论了这些因素在模型和临床实践中的意义：我们的研究确定了 DLNM 的关键预测因素，并阐明了患有 DLNM 的 HCC 患者的重要预后指标。这些发现为临床医生准确识别 DLNM 的高危人群并对这一患者亚群进行更精确的风险分层提供了宝贵的工具，从而有可能改善管理策略和患者预后。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Journal of Gene Medicine 医学-生物工程与应用微生物

CiteScore

6.40

自引率

0.00%

发文量

审稿时长

6-12 weeks

期刊介绍： The aims and scope of The Journal of Gene Medicine include cutting-edge science of gene transfer and its applications in gene and cell therapy, genome editing with precision nucleases, epigenetic modifications of host genome by small molecules, siRNA, microRNA and other noncoding RNAs as therapeutic gene-modulating agents or targets, biomarkers for precision medicine, and gene-based prognostic/diagnostic studies. Key areas of interest are the design of novel synthetic and viral vectors, novel therapeutic nucleic acids such as mRNA, modified microRNAs and siRNAs, antagomirs, aptamers, antisense and exon-skipping agents, refined genome editing tools using nucleic acid /protein combinations, physically or biologically targeted delivery and gene modulation, ex vivo or in vivo pharmacological studies including animal models, and human clinical trials. Papers presenting research into the mechanisms underlying transfer and action of gene medicines, the application of the new technologies for stem cell modification or nucleic acid based vaccines, the identification of new genetic or epigenetic variations as biomarkers to direct precision medicine, and the preclinical/clinical development of gene/expression signatures indicative of diagnosis or predictive of prognosis are also encouraged.