{"title":"基于SEER数据库和中国单一中心数据的浆液性卵巢癌的机器学习预后预测","authors":"Huan Chen, Yuexing Zhao, Qian Sun, Pei Jiao","doi":"10.21037/tcr-2025-540","DOIUrl":null,"url":null,"abstract":"<p><strong>Background: </strong>Ovarian cancer, particularly serous ovarian cancer, is the leading cause of death among gynecological malignancies. Despite advances in treatment, prognosis remains poor due to the tumor's heterogeneity and the frequent late-stage diagnosis, making survival a critical concern for patients. However, there is a lack of accurate clinical prognostic models to guide treatment decisions. Therefore, this study aimed to develop and validate a robust prognostic model for serous ovarian cancer using machine learning.</p><p><strong>Methods: </strong>Data for this study were obtained from the Surveillance, Epidemiology, and End Results (SEER) database (2010-2021) and Yancheng Dafeng People's Hospital (2012-2020). We used univariate and multivariate Cox regression analyses to identify independent risk factors and constructed a Light Gradient Boosting Machine (LightGBM) model with 10-fold cross-validation and hyperparameter tuning. The model's performance was evaluated using area under the receiver operating characteristic curve (ROC-AUC), feature importance rankings, and confusion matrices.</p><p><strong>Results: </strong>A total of 7,916 cases from the SEER database and 163 cases from Yancheng Dafeng People's Hospital were included in the analysis. The LightGBM model outperformed other machine learning models, with ROC-AUC values of 0.902 [95% confidence interval (CI): 0.881-0.923], 0.863 (95% CI: 0.841-0.886), 0.814 (95% CI: 0.794-0.835), and 0.816 (95% CI: 0.796-0.835) at 6, 12, 24, and 36 months, respectively, in the test set. Additionally, the model maintained robust performance in external validation, with ROC-AUC values of 0.821 (95% CI: 0.718-0.923), 0.785 (95% CI: 0.698-0.871), 0.745 (95% CI: 0.669-0.821), and 0.790 (95% CI: 0.722-0.858) at 6, 12, 24, and 36 months, respectively. We also identified surgery as the most significant predictor of survival, followed by chemotherapy, in ovarian cancer patients.</p><p><strong>Conclusions: </strong>We utilized the LightGBM model to predict survival in ovarian cancer patients, demonstrating excellent prognostic accuracy and high reproducibility. This model provides a valuable tool for guiding clinical decision-making and optimizing treatment strategies. Future research is needed to further validate its applicability across different populations.</p>","PeriodicalId":23216,"journal":{"name":"Translational cancer research","volume":"14 8","pages":"4703-4719"},"PeriodicalIF":1.7000,"publicationDate":"2025-08-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12432648/pdf/","citationCount":"0","resultStr":"{\"title\":\"Machine learning-based prognosis prediction for serous ovarian cancer using the SEER database and data from a single center in China.\",\"authors\":\"Huan Chen, Yuexing Zhao, Qian Sun, Pei Jiao\",\"doi\":\"10.21037/tcr-2025-540\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p><strong>Background: </strong>Ovarian cancer, particularly serous ovarian cancer, is the leading cause of death among gynecological malignancies. Despite advances in treatment, prognosis remains poor due to the tumor's heterogeneity and the frequent late-stage diagnosis, making survival a critical concern for patients. However, there is a lack of accurate clinical prognostic models to guide treatment decisions. Therefore, this study aimed to develop and validate a robust prognostic model for serous ovarian cancer using machine learning.</p><p><strong>Methods: </strong>Data for this study were obtained from the Surveillance, Epidemiology, and End Results (SEER) database (2010-2021) and Yancheng Dafeng People's Hospital (2012-2020). We used univariate and multivariate Cox regression analyses to identify independent risk factors and constructed a Light Gradient Boosting Machine (LightGBM) model with 10-fold cross-validation and hyperparameter tuning. The model's performance was evaluated using area under the receiver operating characteristic curve (ROC-AUC), feature importance rankings, and confusion matrices.</p><p><strong>Results: </strong>A total of 7,916 cases from the SEER database and 163 cases from Yancheng Dafeng People's Hospital were included in the analysis. The LightGBM model outperformed other machine learning models, with ROC-AUC values of 0.902 [95% confidence interval (CI): 0.881-0.923], 0.863 (95% CI: 0.841-0.886), 0.814 (95% CI: 0.794-0.835), and 0.816 (95% CI: 0.796-0.835) at 6, 12, 24, and 36 months, respectively, in the test set. Additionally, the model maintained robust performance in external validation, with ROC-AUC values of 0.821 (95% CI: 0.718-0.923), 0.785 (95% CI: 0.698-0.871), 0.745 (95% CI: 0.669-0.821), and 0.790 (95% CI: 0.722-0.858) at 6, 12, 24, and 36 months, respectively. We also identified surgery as the most significant predictor of survival, followed by chemotherapy, in ovarian cancer patients.</p><p><strong>Conclusions: </strong>We utilized the LightGBM model to predict survival in ovarian cancer patients, demonstrating excellent prognostic accuracy and high reproducibility. This model provides a valuable tool for guiding clinical decision-making and optimizing treatment strategies. Future research is needed to further validate its applicability across different populations.</p>\",\"PeriodicalId\":23216,\"journal\":{\"name\":\"Translational cancer research\",\"volume\":\"14 8\",\"pages\":\"4703-4719\"},\"PeriodicalIF\":1.7000,\"publicationDate\":\"2025-08-31\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12432648/pdf/\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Translational cancer research\",\"FirstCategoryId\":\"3\",\"ListUrlMain\":\"https://doi.org/10.21037/tcr-2025-540\",\"RegionNum\":4,\"RegionCategory\":\"医学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"2025/8/14 0:00:00\",\"PubModel\":\"Epub\",\"JCR\":\"Q4\",\"JCRName\":\"ONCOLOGY\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Translational cancer research","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.21037/tcr-2025-540","RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2025/8/14 0:00:00","PubModel":"Epub","JCR":"Q4","JCRName":"ONCOLOGY","Score":null,"Total":0}
Machine learning-based prognosis prediction for serous ovarian cancer using the SEER database and data from a single center in China.
Background: Ovarian cancer, particularly serous ovarian cancer, is the leading cause of death among gynecological malignancies. Despite advances in treatment, prognosis remains poor due to the tumor's heterogeneity and the frequent late-stage diagnosis, making survival a critical concern for patients. However, there is a lack of accurate clinical prognostic models to guide treatment decisions. Therefore, this study aimed to develop and validate a robust prognostic model for serous ovarian cancer using machine learning.
Methods: Data for this study were obtained from the Surveillance, Epidemiology, and End Results (SEER) database (2010-2021) and Yancheng Dafeng People's Hospital (2012-2020). We used univariate and multivariate Cox regression analyses to identify independent risk factors and constructed a Light Gradient Boosting Machine (LightGBM) model with 10-fold cross-validation and hyperparameter tuning. The model's performance was evaluated using area under the receiver operating characteristic curve (ROC-AUC), feature importance rankings, and confusion matrices.
Results: A total of 7,916 cases from the SEER database and 163 cases from Yancheng Dafeng People's Hospital were included in the analysis. The LightGBM model outperformed other machine learning models, with ROC-AUC values of 0.902 [95% confidence interval (CI): 0.881-0.923], 0.863 (95% CI: 0.841-0.886), 0.814 (95% CI: 0.794-0.835), and 0.816 (95% CI: 0.796-0.835) at 6, 12, 24, and 36 months, respectively, in the test set. Additionally, the model maintained robust performance in external validation, with ROC-AUC values of 0.821 (95% CI: 0.718-0.923), 0.785 (95% CI: 0.698-0.871), 0.745 (95% CI: 0.669-0.821), and 0.790 (95% CI: 0.722-0.858) at 6, 12, 24, and 36 months, respectively. We also identified surgery as the most significant predictor of survival, followed by chemotherapy, in ovarian cancer patients.
Conclusions: We utilized the LightGBM model to predict survival in ovarian cancer patients, demonstrating excellent prognostic accuracy and high reproducibility. This model provides a valuable tool for guiding clinical decision-making and optimizing treatment strategies. Future research is needed to further validate its applicability across different populations.
期刊介绍:
Translational Cancer Research (Transl Cancer Res TCR; Print ISSN: 2218-676X; Online ISSN 2219-6803; http://tcr.amegroups.com/) is an Open Access, peer-reviewed journal, indexed in Science Citation Index Expanded (SCIE). TCR publishes laboratory studies of novel therapeutic interventions as well as clinical trials which evaluate new treatment paradigms for cancer; results of novel research investigations which bridge the laboratory and clinical settings including risk assessment, cellular and molecular characterization, prevention, detection, diagnosis and treatment of human cancers with the overall goal of improving the clinical care of cancer patients. The focus of TCR is original, peer-reviewed, science-based research that successfully advances clinical medicine toward the goal of improving patients'' quality of life. The editors and an international advisory group of scientists and clinician-scientists as well as other experts will hold TCR articles to the high-quality standards. We accept Original Articles as well as Review Articles, Editorials and Brief Articles.