Xian Gong, Maojie Pan, Yuxing Lin, Xiaoxuan Ye, Jiekun Qian, Guoliang Liao, Jianting Du, Bin Zheng, Chun Chen, Zhang Yang
{"title":"Prognostic models for large cell neuroendocrine lung carcinoma: a machine learning and regression approach.","authors":"Xian Gong, Maojie Pan, Yuxing Lin, Xiaoxuan Ye, Jiekun Qian, Guoliang Liao, Jianting Du, Bin Zheng, Chun Chen, Zhang Yang","doi":"10.21037/tlcr-2025-130","DOIUrl":null,"url":null,"abstract":"<p><strong>Background: </strong>Large cell neuroendocrine lung carcinoma (LCNEC) is a rare and aggressive subtype of lung cancer with high rates of lymph node metastasis (60-80%) and distant metastasis (40%) at diagnosis. This study aimed to develop and evaluate 5-year survival prognostic models for patients with LCNEC, comparing the traditional Cox proportional hazards regression model with machine learning approaches, including Gradient Boosting, XGboost, Random Survival Forests, Extra Survival Trees, and Neural Networks.</p><p><strong>Methods: </strong>This retrospective cohort study utilized data from the Surveillance, Epidemiology, and End Results (SEER) database (2000-2021), including 6,062 patients with pathologically confirmed LCNEC. The primary outcome was the 5-year survival probability. The study employed regression and machine learning approaches, with data that was stratified into training and testing sets based on the year of diagnosis, and four stratification variables were analyzed. Internal-external cross-validation assessed the model performance, while decision curve analysis (DCA) evaluated clinical utility.</p><p><strong>Results: </strong>The Gradient Boosting model showed better discrimination than all others, achieving the best pooled metrics. Harrell's C-index of 0.799, Brier score of 0.047, Calibration slope of 1.126 and Calibration-in-the-large of 0.155. Our SHAP value analysis identified chemotherapy as one of the most influential predictors of survival outcomes in LCNEC patients, highlighting its potential clinical importance in guiding treatment strategies for this population. DCA confirmed its superior clinical utility.</p><p><strong>Conclusions: </strong>Gradient Boosting exhibited excellent predictive accuracy and clinical utility, demonstrating its potential for prognostic evaluation for LCNEC patients.</p>","PeriodicalId":23271,"journal":{"name":"Translational lung cancer research","volume":"14 7","pages":"2470-2482"},"PeriodicalIF":3.5000,"publicationDate":"2025-07-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12337033/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Translational lung cancer research","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.21037/tlcr-2025-130","RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2025/7/28 0:00:00","PubModel":"Epub","JCR":"Q2","JCRName":"ONCOLOGY","Score":null,"Total":0}
引用次数: 0
Abstract
Background: Large cell neuroendocrine lung carcinoma (LCNEC) is a rare and aggressive subtype of lung cancer with high rates of lymph node metastasis (60-80%) and distant metastasis (40%) at diagnosis. This study aimed to develop and evaluate 5-year survival prognostic models for patients with LCNEC, comparing the traditional Cox proportional hazards regression model with machine learning approaches, including Gradient Boosting, XGboost, Random Survival Forests, Extra Survival Trees, and Neural Networks.
Methods: This retrospective cohort study utilized data from the Surveillance, Epidemiology, and End Results (SEER) database (2000-2021), including 6,062 patients with pathologically confirmed LCNEC. The primary outcome was the 5-year survival probability. The study employed regression and machine learning approaches, with data that was stratified into training and testing sets based on the year of diagnosis, and four stratification variables were analyzed. Internal-external cross-validation assessed the model performance, while decision curve analysis (DCA) evaluated clinical utility.
Results: The Gradient Boosting model showed better discrimination than all others, achieving the best pooled metrics. Harrell's C-index of 0.799, Brier score of 0.047, Calibration slope of 1.126 and Calibration-in-the-large of 0.155. Our SHAP value analysis identified chemotherapy as one of the most influential predictors of survival outcomes in LCNEC patients, highlighting its potential clinical importance in guiding treatment strategies for this population. DCA confirmed its superior clinical utility.
Conclusions: Gradient Boosting exhibited excellent predictive accuracy and clinical utility, demonstrating its potential for prognostic evaluation for LCNEC patients.
期刊介绍:
Translational Lung Cancer Research(TLCR, Transl Lung Cancer Res, Print ISSN 2218-6751; Online ISSN 2226-4477) is an international, peer-reviewed, open-access journal, which was founded in March 2012. TLCR is indexed by PubMed/PubMed Central and the Chemical Abstracts Service (CAS) Databases. It is published quarterly the first year, and published bimonthly since February 2013. It provides practical up-to-date information on prevention, early detection, diagnosis, and treatment of lung cancer. Specific areas of its interest include, but not limited to, multimodality therapy, markers, imaging, tumor biology, pathology, chemoprevention, and technical advances related to lung cancer.