Chao Du, Qi Liu, Yuanyuan Guo, Jun Gong, Ling Yan, Zhijie Li, Changchun Niu
{"title":"基于临床实验室数据的机器学习模型预测肺癌转移","authors":"Chao Du, Qi Liu, Yuanyuan Guo, Jun Gong, Ling Yan, Zhijie Li, Changchun Niu","doi":"10.1002/cnr2.70350","DOIUrl":null,"url":null,"abstract":"<div>\n \n \n <section>\n \n <h3> Background</h3>\n \n <p>Lymph node (N) or/and distant metastasis in lung cancer indicates poorer prognosis. While laboratory tests and computed tomography (CT) scans reflect tumor growth and metabolic activity, they usually require combination with other diagnostic methods to effectively assess metastasis, resulting in limited clinical use of these results.</p>\n </section>\n \n <section>\n \n <h3> Aims</h3>\n \n <p>Develop machine learning models using diverse clinical laboratory data to predict lymph node invasion and skip N metastasis in lung cancer.</p>\n </section>\n \n <section>\n \n <h3> Methods</h3>\n \n <p>This study performs regression analysis on lung cancer cases initially diagnosed by histopathology, categorized into N and M (skip N metastasis) groups by TNM stage. Laboratory and clinical test results were collected as characteristic parameters. Univariate analysis and lasso regression identified key predictors, and four machine learning algorithms developed the model.</p>\n </section>\n \n <section>\n \n <h3> Results</h3>\n \n <p>Of the 1629 cases analyzed, 861 were assigned to the N group and 519 to the M group. Univariate analysis revealed significant differences in 40 parameters in Group N and 27 parameters in Group M (<i>p</i> < 0.05). LASSO regression identified 13 characteristic factors for the N group and 12 for the M group. In the N group, the factors included tumor size, prothrombin time (PT), mean platelet volume, fibrinogen, platelet count, procalcitonin, carbohydrate antigen 15–3 (CA 15–3), carcinoembryonic antigen (CEA), adenosine deaminase, red blood cell distribution width, thrombin time, smoking history, and alcohol consumption history. In the M group, the factors included cytokeratin 19 fragment, tumor size, CEA, CA 15–3, squamous cell carcinoma antigen (SCCA), alkaline phosphatase, fibrinogen, hemoglobin, calcium, albumin, PT, and absolute monocyte count. The test cohort results indicated that the logistic regression model was optimal for both groups, achieving AUC values of 0.888 and 0.875, respectively.</p>\n </section>\n \n <section>\n \n <h3> Conclusion</h3>\n \n <p>The study demonstrated the potential of using ML algorithms, laboratory data, and clinical features to predict N involvement and skip N metastasis in lung cancer.</p>\n </section>\n </div>","PeriodicalId":9440,"journal":{"name":"Cancer reports","volume":"8 10","pages":""},"PeriodicalIF":1.9000,"publicationDate":"2025-10-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12500416/pdf/","citationCount":"0","resultStr":"{\"title\":\"Prediction of Lung Cancer Metastasis Using Machine Learning Models Based on Clinical Laboratory Data\",\"authors\":\"Chao Du, Qi Liu, Yuanyuan Guo, Jun Gong, Ling Yan, Zhijie Li, Changchun Niu\",\"doi\":\"10.1002/cnr2.70350\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div>\\n \\n \\n <section>\\n \\n <h3> Background</h3>\\n \\n <p>Lymph node (N) or/and distant metastasis in lung cancer indicates poorer prognosis. While laboratory tests and computed tomography (CT) scans reflect tumor growth and metabolic activity, they usually require combination with other diagnostic methods to effectively assess metastasis, resulting in limited clinical use of these results.</p>\\n </section>\\n \\n <section>\\n \\n <h3> Aims</h3>\\n \\n <p>Develop machine learning models using diverse clinical laboratory data to predict lymph node invasion and skip N metastasis in lung cancer.</p>\\n </section>\\n \\n <section>\\n \\n <h3> Methods</h3>\\n \\n <p>This study performs regression analysis on lung cancer cases initially diagnosed by histopathology, categorized into N and M (skip N metastasis) groups by TNM stage. Laboratory and clinical test results were collected as characteristic parameters. Univariate analysis and lasso regression identified key predictors, and four machine learning algorithms developed the model.</p>\\n </section>\\n \\n <section>\\n \\n <h3> Results</h3>\\n \\n <p>Of the 1629 cases analyzed, 861 were assigned to the N group and 519 to the M group. Univariate analysis revealed significant differences in 40 parameters in Group N and 27 parameters in Group M (<i>p</i> < 0.05). LASSO regression identified 13 characteristic factors for the N group and 12 for the M group. In the N group, the factors included tumor size, prothrombin time (PT), mean platelet volume, fibrinogen, platelet count, procalcitonin, carbohydrate antigen 15–3 (CA 15–3), carcinoembryonic antigen (CEA), adenosine deaminase, red blood cell distribution width, thrombin time, smoking history, and alcohol consumption history. In the M group, the factors included cytokeratin 19 fragment, tumor size, CEA, CA 15–3, squamous cell carcinoma antigen (SCCA), alkaline phosphatase, fibrinogen, hemoglobin, calcium, albumin, PT, and absolute monocyte count. The test cohort results indicated that the logistic regression model was optimal for both groups, achieving AUC values of 0.888 and 0.875, respectively.</p>\\n </section>\\n \\n <section>\\n \\n <h3> Conclusion</h3>\\n \\n <p>The study demonstrated the potential of using ML algorithms, laboratory data, and clinical features to predict N involvement and skip N metastasis in lung cancer.</p>\\n </section>\\n </div>\",\"PeriodicalId\":9440,\"journal\":{\"name\":\"Cancer reports\",\"volume\":\"8 10\",\"pages\":\"\"},\"PeriodicalIF\":1.9000,\"publicationDate\":\"2025-10-06\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12500416/pdf/\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Cancer reports\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://onlinelibrary.wiley.com/doi/10.1002/cnr2.70350\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q4\",\"JCRName\":\"ONCOLOGY\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Cancer reports","FirstCategoryId":"1085","ListUrlMain":"https://onlinelibrary.wiley.com/doi/10.1002/cnr2.70350","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q4","JCRName":"ONCOLOGY","Score":null,"Total":0}
Prediction of Lung Cancer Metastasis Using Machine Learning Models Based on Clinical Laboratory Data
Background
Lymph node (N) or/and distant metastasis in lung cancer indicates poorer prognosis. While laboratory tests and computed tomography (CT) scans reflect tumor growth and metabolic activity, they usually require combination with other diagnostic methods to effectively assess metastasis, resulting in limited clinical use of these results.
Aims
Develop machine learning models using diverse clinical laboratory data to predict lymph node invasion and skip N metastasis in lung cancer.
Methods
This study performs regression analysis on lung cancer cases initially diagnosed by histopathology, categorized into N and M (skip N metastasis) groups by TNM stage. Laboratory and clinical test results were collected as characteristic parameters. Univariate analysis and lasso regression identified key predictors, and four machine learning algorithms developed the model.
Results
Of the 1629 cases analyzed, 861 were assigned to the N group and 519 to the M group. Univariate analysis revealed significant differences in 40 parameters in Group N and 27 parameters in Group M (p < 0.05). LASSO regression identified 13 characteristic factors for the N group and 12 for the M group. In the N group, the factors included tumor size, prothrombin time (PT), mean platelet volume, fibrinogen, platelet count, procalcitonin, carbohydrate antigen 15–3 (CA 15–3), carcinoembryonic antigen (CEA), adenosine deaminase, red blood cell distribution width, thrombin time, smoking history, and alcohol consumption history. In the M group, the factors included cytokeratin 19 fragment, tumor size, CEA, CA 15–3, squamous cell carcinoma antigen (SCCA), alkaline phosphatase, fibrinogen, hemoglobin, calcium, albumin, PT, and absolute monocyte count. The test cohort results indicated that the logistic regression model was optimal for both groups, achieving AUC values of 0.888 and 0.875, respectively.
Conclusion
The study demonstrated the potential of using ML algorithms, laboratory data, and clinical features to predict N involvement and skip N metastasis in lung cancer.