基于SEER数据库和中国单一中心数据的浆液性卵巢癌的机器学习预后预测

IF 1.7 4区医学 Q4 ONCOLOGY

Translational cancer research Pub Date : 2025-08-31 Epub Date: 2025-08-14 DOI:10.21037/tcr-2025-540

Huan Chen, Yuexing Zhao, Qian Sun, Pei Jiao

{"title":"基于SEER数据库和中国单一中心数据的浆液性卵巢癌的机器学习预后预测","authors":"Huan Chen, Yuexing Zhao, Qian Sun, Pei Jiao","doi":"10.21037/tcr-2025-540","DOIUrl":null,"url":null,"abstract":"Background: Ovarian cancer, particularly serous ovarian cancer, is the leading cause of death among gynecological malignancies. Despite advances in treatment, prognosis remains poor due to the tumor's heterogeneity and the frequent late-stage diagnosis, making survival a critical concern for patients. However, there is a lack of accurate clinical prognostic models to guide treatment decisions. Therefore, this study aimed to develop and validate a robust prognostic model for serous ovarian cancer using machine learning.Methods: Data for this study were obtained from the Surveillance, Epidemiology, and End Results (SEER) database (2010-2021) and Yancheng Dafeng People's Hospital (2012-2020). We used univariate and multivariate Cox regression analyses to identify independent risk factors and constructed a Light Gradient Boosting Machine (LightGBM) model with 10-fold cross-validation and hyperparameter tuning. The model's performance was evaluated using area under the receiver operating characteristic curve (ROC-AUC), feature importance rankings, and confusion matrices.Results: A total of 7,916 cases from the SEER database and 163 cases from Yancheng Dafeng People's Hospital were included in the analysis. The LightGBM model outperformed other machine learning models, with ROC-AUC values of 0.902 [95% confidence interval (CI): 0.881-0.923], 0.863 (95% CI: 0.841-0.886), 0.814 (95% CI: 0.794-0.835), and 0.816 (95% CI: 0.796-0.835) at 6, 12, 24, and 36 months, respectively, in the test set. Additionally, the model maintained robust performance in external validation, with ROC-AUC values of 0.821 (95% CI: 0.718-0.923), 0.785 (95% CI: 0.698-0.871), 0.745 (95% CI: 0.669-0.821), and 0.790 (95% CI: 0.722-0.858) at 6, 12, 24, and 36 months, respectively. We also identified surgery as the most significant predictor of survival, followed by chemotherapy, in ovarian cancer patients.Conclusions: We utilized the LightGBM model to predict survival in ovarian cancer patients, demonstrating excellent prognostic accuracy and high reproducibility. This model provides a valuable tool for guiding clinical decision-making and optimizing treatment strategies. Future research is needed to further validate its applicability across different populations.","PeriodicalId":23216,"journal":{"name":"Translational cancer research","volume":"14 8","pages":"4703-4719"},"PeriodicalIF":1.7000,"publicationDate":"2025-08-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12432648/pdf/","citationCount":"0","resultStr":"{\"title\":\"Machine learning-based prognosis prediction for serous ovarian cancer using the SEER database and data from a single center in China.\",\"authors\":\"Huan Chen, Yuexing Zhao, Qian Sun, Pei Jiao\",\"doi\":\"10.21037/tcr-2025-540\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Background: Ovarian cancer, particularly serous ovarian cancer, is the leading cause of death among gynecological malignancies. Despite advances in treatment, prognosis remains poor due to the tumor's heterogeneity and the frequent late-stage diagnosis, making survival a critical concern for patients. However, there is a lack of accurate clinical prognostic models to guide treatment decisions. Therefore, this study aimed to develop and validate a robust prognostic model for serous ovarian cancer using machine learning.Methods: Data for this study were obtained from the Surveillance, Epidemiology, and End Results (SEER) database (2010-2021) and Yancheng Dafeng People's Hospital (2012-2020). We used univariate and multivariate Cox regression analyses to identify independent risk factors and constructed a Light Gradient Boosting Machine (LightGBM) model with 10-fold cross-validation and hyperparameter tuning. The model's performance was evaluated using area under the receiver operating characteristic curve (ROC-AUC), feature importance rankings, and confusion matrices.Results: A total of 7,916 cases from the SEER database and 163 cases from Yancheng Dafeng People's Hospital were included in the analysis. The LightGBM model outperformed other machine learning models, with ROC-AUC values of 0.902 [95% confidence interval (CI): 0.881-0.923], 0.863 (95% CI: 0.841-0.886), 0.814 (95% CI: 0.794-0.835), and 0.816 (95% CI: 0.796-0.835) at 6, 12, 24, and 36 months, respectively, in the test set. Additionally, the model maintained robust performance in external validation, with ROC-AUC values of 0.821 (95% CI: 0.718-0.923), 0.785 (95% CI: 0.698-0.871), 0.745 (95% CI: 0.669-0.821), and 0.790 (95% CI: 0.722-0.858) at 6, 12, 24, and 36 months, respectively. We also identified surgery as the most significant predictor of survival, followed by chemotherapy, in ovarian cancer patients.Conclusions: We utilized the LightGBM model to predict survival in ovarian cancer patients, demonstrating excellent prognostic accuracy and high reproducibility. This model provides a valuable tool for guiding clinical decision-making and optimizing treatment strategies. Future research is needed to further validate its applicability across different populations.\",\"PeriodicalId\":23216,\"journal\":{\"name\":\"Translational cancer research\",\"volume\":\"14 8\",\"pages\":\"4703-4719\"},\"PeriodicalIF\":1.7000,\"publicationDate\":\"2025-08-31\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12432648/pdf/\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Translational cancer research\",\"FirstCategoryId\":\"3\",\"ListUrlMain\":\"https://doi.org/10.21037/tcr-2025-540\",\"RegionNum\":4,\"RegionCategory\":\"医学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"2025/8/14 0:00:00\",\"PubModel\":\"Epub\",\"JCR\":\"Q4\",\"JCRName\":\"ONCOLOGY\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Translational cancer research","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.21037/tcr-2025-540","RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2025/8/14 0:00:00","PubModel":"Epub","JCR":"Q4","JCRName":"ONCOLOGY","Score":null,"Total":0}

引用次数: 0

摘要

背景：卵巢癌，特别是浆液性卵巢癌，是妇科恶性肿瘤死亡的主要原因。尽管治疗取得了进展，但由于肿瘤的异质性和频繁的晚期诊断，预后仍然很差，使患者的生存成为关键问题。然而，缺乏准确的临床预后模型来指导治疗决策。因此，本研究旨在利用机器学习开发和验证一种强大的浆液性卵巢癌预后模型。方法：本研究的数据来自监测、流行病学和最终结果（SEER）数据库（2010-2021）和盐城市大丰人民医院（2012-2020）。我们使用单变量和多变量Cox回归分析来确定独立的危险因素，并构建了一个具有10倍交叉验证和超参数调整的光梯度增强机（LightGBM）模型。该模型的性能是通过接受者工作特征曲线下面积（ROC-AUC）、特征重要性排名和混淆矩阵来评估的。结果：共纳入SEER数据库中的7,916例病例和盐城市大丰人民医院的163例病例。LightGBM模型优于其他机器学习模型，在测试集中，6个月、12个月、24个月和36个月的ROC-AUC值分别为0.902[95%置信区间（CI）： 0.881-0.923]、0.863 （95% CI: 0.841-0.886）、0.814 （95% CI: 0.794-0.835）和0.816 （95% CI: 0.796-0.835）。此外，该模型在外部验证中保持了稳健的性能，在6、12、24和36个月时，ROC-AUC值分别为0.821 （95% CI: 0.718-0.923）、0.785 （95% CI: 0.698-0.871）、0.745 （95% CI: 0.669-0.821）和0.790 （95% CI: 0.722-0.858）。我们还确定手术是卵巢癌患者生存最重要的预测因素，其次是化疗。结论：我们使用LightGBM模型预测卵巢癌患者的生存，显示出良好的预后准确性和高重复性。该模型为指导临床决策和优化治疗策略提供了有价值的工具。未来的研究需要进一步验证其在不同人群中的适用性。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

Machine learning-based prognosis prediction for serous ovarian cancer using the SEER database and data from a single center in China.

查看原文本刊更多论文

Machine learning-based prognosis prediction for serous ovarian cancer using the SEER database and data from a single center in China.

Background: Ovarian cancer, particularly serous ovarian cancer, is the leading cause of death among gynecological malignancies. Despite advances in treatment, prognosis remains poor due to the tumor's heterogeneity and the frequent late-stage diagnosis, making survival a critical concern for patients. However, there is a lack of accurate clinical prognostic models to guide treatment decisions. Therefore, this study aimed to develop and validate a robust prognostic model for serous ovarian cancer using machine learning.

Methods: Data for this study were obtained from the Surveillance, Epidemiology, and End Results (SEER) database (2010-2021) and Yancheng Dafeng People's Hospital (2012-2020). We used univariate and multivariate Cox regression analyses to identify independent risk factors and constructed a Light Gradient Boosting Machine (LightGBM) model with 10-fold cross-validation and hyperparameter tuning. The model's performance was evaluated using area under the receiver operating characteristic curve (ROC-AUC), feature importance rankings, and confusion matrices.

Results: A total of 7,916 cases from the SEER database and 163 cases from Yancheng Dafeng People's Hospital were included in the analysis. The LightGBM model outperformed other machine learning models, with ROC-AUC values of 0.902 [95% confidence interval (CI): 0.881-0.923], 0.863 (95% CI: 0.841-0.886), 0.814 (95% CI: 0.794-0.835), and 0.816 (95% CI: 0.796-0.835) at 6, 12, 24, and 36 months, respectively, in the test set. Additionally, the model maintained robust performance in external validation, with ROC-AUC values of 0.821 (95% CI: 0.718-0.923), 0.785 (95% CI: 0.698-0.871), 0.745 (95% CI: 0.669-0.821), and 0.790 (95% CI: 0.722-0.858) at 6, 12, 24, and 36 months, respectively. We also identified surgery as the most significant predictor of survival, followed by chemotherapy, in ovarian cancer patients.

Conclusions: We utilized the LightGBM model to predict survival in ovarian cancer patients, demonstrating excellent prognostic accuracy and high reproducibility. This model provides a valuable tool for guiding clinical decision-making and optimizing treatment strategies. Future research is needed to further validate its applicability across different populations.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Translational cancer research ONCOLOGY-

CiteScore

2.10

自引率

0.00%

发文量

252

期刊介绍： Translational Cancer Research (Transl Cancer Res TCR; Print ISSN: 2218-676X; Online ISSN 2219-6803; http://tcr.amegroups.com/) is an Open Access, peer-reviewed journal, indexed in Science Citation Index Expanded (SCIE). TCR publishes laboratory studies of novel therapeutic interventions as well as clinical trials which evaluate new treatment paradigms for cancer; results of novel research investigations which bridge the laboratory and clinical settings including risk assessment, cellular and molecular characterization, prevention, detection, diagnosis and treatment of human cancers with the overall goal of improving the clinical care of cancer patients. The focus of TCR is original, peer-reviewed, science-based research that successfully advances clinical medicine toward the goal of improving patients'' quality of life. The editors and an international advisory group of scientists and clinician-scientists as well as other experts will hold TCR articles to the high-quality standards. We accept Original Articles as well as Review Articles, Editorials and Brief Articles.