{"title":"机器学习驱动的肺癌分期基因表达谱分析。","authors":"Yinbo Wang, Kai Fu","doi":"10.1177/18758592251367223","DOIUrl":null,"url":null,"abstract":"<p><p>BackgroundLung cancer remains a leading cause of cancer-related mortality, with accurate staging essential for guiding treatment. Advances in next-generation sequencing (NGS) and machine learning (ML) enable more precise classification, improving on traditional imaging-based methods.ObjectiveThis retrospective study applies XGBoost with cross-validation (CV) to classify early vs. late-stage lung cancer using RNA-Seq data from 993 patients in The Cancer Genome Atlas (TCGA) cohort.MethodsGene selection was conducted using the Wilcoxon rank-sum test on training data, and the XGBoost model was optimized via cross-validation. Model performance was assessed using the Area Under the Curve (AUC), with sensitivity-specificity analysis across classification thresholds.ResultsThe XGBoost model achieved a test AUC of 0.6534, identifying 40 key genes that optimize predictive accuracy while minimizing overfitting. Thresholds of 0.3 and 0.4 were optimal, balancing sensitivity and specificity for clinical application<b>.</b>ConclusionsIntegrating RNA-Seq data with machine learning improves lung cancer staging accuracy. Future research should focus on dataset expansion, model benchmarking, and multi-omics integration to enhance clinical applicability.</p>","PeriodicalId":520578,"journal":{"name":"Cancer biomarkers : section A of Disease markers","volume":"42 9","pages":"18758592251367223"},"PeriodicalIF":1.9000,"publicationDate":"2025-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Machine learning-driven gene expression profiling for lung cancer stage determination.\",\"authors\":\"Yinbo Wang, Kai Fu\",\"doi\":\"10.1177/18758592251367223\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p><p>BackgroundLung cancer remains a leading cause of cancer-related mortality, with accurate staging essential for guiding treatment. Advances in next-generation sequencing (NGS) and machine learning (ML) enable more precise classification, improving on traditional imaging-based methods.ObjectiveThis retrospective study applies XGBoost with cross-validation (CV) to classify early vs. late-stage lung cancer using RNA-Seq data from 993 patients in The Cancer Genome Atlas (TCGA) cohort.MethodsGene selection was conducted using the Wilcoxon rank-sum test on training data, and the XGBoost model was optimized via cross-validation. Model performance was assessed using the Area Under the Curve (AUC), with sensitivity-specificity analysis across classification thresholds.ResultsThe XGBoost model achieved a test AUC of 0.6534, identifying 40 key genes that optimize predictive accuracy while minimizing overfitting. Thresholds of 0.3 and 0.4 were optimal, balancing sensitivity and specificity for clinical application<b>.</b>ConclusionsIntegrating RNA-Seq data with machine learning improves lung cancer staging accuracy. Future research should focus on dataset expansion, model benchmarking, and multi-omics integration to enhance clinical applicability.</p>\",\"PeriodicalId\":520578,\"journal\":{\"name\":\"Cancer biomarkers : section A of Disease markers\",\"volume\":\"42 9\",\"pages\":\"18758592251367223\"},\"PeriodicalIF\":1.9000,\"publicationDate\":\"2025-09-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Cancer biomarkers : section A of Disease markers\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1177/18758592251367223\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"2025/9/12 0:00:00\",\"PubModel\":\"Epub\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Cancer biomarkers : section A of Disease markers","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1177/18758592251367223","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2025/9/12 0:00:00","PubModel":"Epub","JCR":"","JCRName":"","Score":null,"Total":0}
Machine learning-driven gene expression profiling for lung cancer stage determination.
BackgroundLung cancer remains a leading cause of cancer-related mortality, with accurate staging essential for guiding treatment. Advances in next-generation sequencing (NGS) and machine learning (ML) enable more precise classification, improving on traditional imaging-based methods.ObjectiveThis retrospective study applies XGBoost with cross-validation (CV) to classify early vs. late-stage lung cancer using RNA-Seq data from 993 patients in The Cancer Genome Atlas (TCGA) cohort.MethodsGene selection was conducted using the Wilcoxon rank-sum test on training data, and the XGBoost model was optimized via cross-validation. Model performance was assessed using the Area Under the Curve (AUC), with sensitivity-specificity analysis across classification thresholds.ResultsThe XGBoost model achieved a test AUC of 0.6534, identifying 40 key genes that optimize predictive accuracy while minimizing overfitting. Thresholds of 0.3 and 0.4 were optimal, balancing sensitivity and specificity for clinical application.ConclusionsIntegrating RNA-Seq data with machine learning improves lung cancer staging accuracy. Future research should focus on dataset expansion, model benchmarking, and multi-omics integration to enhance clinical applicability.