预测肺腺癌恶性分级和指导治疗的机器学习算法：基于CT放射学的比较。

IF 2.1 3区医学 Q3 RESPIRATORY SYSTEM

Journal of thoracic disease Pub Date : 2025-04-30 Epub Date: 2025-04-28 DOI:10.21037/jtd-2025-310

Jun Zhu, Jiayu Tao, Fengfeng Zhang, Jie Yao, Ke Chen, Yuxuan Wang, Xiaochen Lu, Bin Ni, Maoshan Zhu

{"title":"预测肺腺癌恶性分级和指导治疗的机器学习算法：基于CT放射学的比较。","authors":"Jun Zhu, Jiayu Tao, Fengfeng Zhang, Jie Yao, Ke Chen, Yuxuan Wang, Xiaochen Lu, Bin Ni, Maoshan Zhu","doi":"10.21037/jtd-2025-310","DOIUrl":null,"url":null,"abstract":"Background: Lung adenocarcinoma (LUAD) is the most frequently diagnosed subtype of non-small cell lung cancer (NSCLC). Notably, prognosis can vary significantly among LUAD patients with different tumor subtypes. The advent of radiomics and machine learning (ML) technologies enables the development of non-invasive pathology predictive models. We attempted to develop computed tomography (CT) radiomics-based diagnostic models, enhanced by ML, to predict LUAD malignancy grade and guide surgical strategies.Methods: In this retrospective analysis, a total of 168 surgical patients with histology-confirmed LUAD were divided into low-risk group (n=93) and intermediate-to-high-risk group (n=75) based on postoperative pathology. The region of interest (ROI) was delineated on the preoperative CT images for all patients, followed by the extraction of radiomic features. Patients were randomly allocated to a training set (n=117) and a testing set (n=51) in a 7:3 ratio. Within the training set, clinical-radiological model (CM) and radiomics model (RM) were developed utilizing patients' clinical characteristics, radiological semantic features, and radiomic features, along with the calculation of Rad scores. After the Rad scores were combined with independent risk factors among clinical-radiological features, logistic regression (LR), decision tree (DT), random forest (RF), extreme gradient boosting (XGBoost), support vector machine (SVM), K-nearest neighbors (KNN), and naïve Bayes model (NBM) were employed to create different comprehensive models (COMs). The optimal model was identified based on the receiver operating characteristic (ROC) curves and the DeLong test. Finally, Shapley additive explanations (SHAP) were utilized to visualize the predictive processes of the models.Results: Among the 168 patients enrolled, there were 50 males (29.76%) aged 56 (49.25, 67.00) years and 118 females (70.24%) aged 56.5 (42.00, 64.00) years; Diameter (P<0.001), and consolidation-to-tumor ratio (CTR) ≥0.5 (P=0.002) were identified as independent risk factors for the malignancy degree of LUAD during CM creation. The CM had an area under the ROC curve (AUC) of 0.909 [95% confidence interval (CI): 0.856-0.962] in the training set and 0.920 (95% CI: 0.846-0.994) in the testing set. The RM, comprising seven radiomic features, achieved an AUC of 0.961 (95% CI: 0.926-0.996) in the training set and 0.957 (95% CI: 0.905-1.000) in the testing set. Among models created using various ML algorithms, the XGBoost model was identified as the optimal model. SHAP visualization revealed the model prediction processes and the values of different features.Conclusions: We constructed and validated a robust, integrative model leveraging ML and CT radiomics, which amalgamates radiomics, clinical, and radiological attributes to precisely identify LUADs with elevated postoperative pathological grades. This enables doctors to formulate different surgical plans according to the pathology of the patients' tumors before the operation.","PeriodicalId":17542,"journal":{"name":"Journal of thoracic disease","volume":"17 4","pages":"2423-2440"},"PeriodicalIF":2.1000,"publicationDate":"2025-04-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12090144/pdf/","citationCount":"0","resultStr":"{\"title\":\"Machine learning algorithms for predicting malignancy grades of lung adenocarcinoma and guiding treatments: CT radiomics-based comparisons.\",\"authors\":\"Jun Zhu, Jiayu Tao, Fengfeng Zhang, Jie Yao, Ke Chen, Yuxuan Wang, Xiaochen Lu, Bin Ni, Maoshan Zhu\",\"doi\":\"10.21037/jtd-2025-310\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Background: Lung adenocarcinoma (LUAD) is the most frequently diagnosed subtype of non-small cell lung cancer (NSCLC). Notably, prognosis can vary significantly among LUAD patients with different tumor subtypes. The advent of radiomics and machine learning (ML) technologies enables the development of non-invasive pathology predictive models. We attempted to develop computed tomography (CT) radiomics-based diagnostic models, enhanced by ML, to predict LUAD malignancy grade and guide surgical strategies.Methods: In this retrospective analysis, a total of 168 surgical patients with histology-confirmed LUAD were divided into low-risk group (n=93) and intermediate-to-high-risk group (n=75) based on postoperative pathology. The region of interest (ROI) was delineated on the preoperative CT images for all patients, followed by the extraction of radiomic features. Patients were randomly allocated to a training set (n=117) and a testing set (n=51) in a 7:3 ratio. Within the training set, clinical-radiological model (CM) and radiomics model (RM) were developed utilizing patients' clinical characteristics, radiological semantic features, and radiomic features, along with the calculation of Rad scores. After the Rad scores were combined with independent risk factors among clinical-radiological features, logistic regression (LR), decision tree (DT), random forest (RF), extreme gradient boosting (XGBoost), support vector machine (SVM), K-nearest neighbors (KNN), and naïve Bayes model (NBM) were employed to create different comprehensive models (COMs). The optimal model was identified based on the receiver operating characteristic (ROC) curves and the DeLong test. Finally, Shapley additive explanations (SHAP) were utilized to visualize the predictive processes of the models.Results: Among the 168 patients enrolled, there were 50 males (29.76%) aged 56 (49.25, 67.00) years and 118 females (70.24%) aged 56.5 (42.00, 64.00) years; Diameter (P<0.001), and consolidation-to-tumor ratio (CTR) ≥0.5 (P=0.002) were identified as independent risk factors for the malignancy degree of LUAD during CM creation. The CM had an area under the ROC curve (AUC) of 0.909 [95% confidence interval (CI): 0.856-0.962] in the training set and 0.920 (95% CI: 0.846-0.994) in the testing set. The RM, comprising seven radiomic features, achieved an AUC of 0.961 (95% CI: 0.926-0.996) in the training set and 0.957 (95% CI: 0.905-1.000) in the testing set. Among models created using various ML algorithms, the XGBoost model was identified as the optimal model. SHAP visualization revealed the model prediction processes and the values of different features.Conclusions: We constructed and validated a robust, integrative model leveraging ML and CT radiomics, which amalgamates radiomics, clinical, and radiological attributes to precisely identify LUADs with elevated postoperative pathological grades. This enables doctors to formulate different surgical plans according to the pathology of the patients' tumors before the operation.\",\"PeriodicalId\":17542,\"journal\":{\"name\":\"Journal of thoracic disease\",\"volume\":\"17 4\",\"pages\":\"2423-2440\"},\"PeriodicalIF\":2.1000,\"publicationDate\":\"2025-04-30\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12090144/pdf/\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Journal of thoracic disease\",\"FirstCategoryId\":\"3\",\"ListUrlMain\":\"https://doi.org/10.21037/jtd-2025-310\",\"RegionNum\":3,\"RegionCategory\":\"医学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"2025/4/28 0:00:00\",\"PubModel\":\"Epub\",\"JCR\":\"Q3\",\"JCRName\":\"RESPIRATORY SYSTEM\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of thoracic disease","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.21037/jtd-2025-310","RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2025/4/28 0:00:00","PubModel":"Epub","JCR":"Q3","JCRName":"RESPIRATORY SYSTEM","Score":null,"Total":0}

引用次数: 0

摘要

背景：肺腺癌（LUAD）是非小细胞肺癌（NSCLC）中最常见的亚型。值得注意的是，不同肿瘤亚型的LUAD患者预后差异显著。放射组学和机器学习（ML）技术的出现使非侵入性病理预测模型的发展成为可能。我们试图开发基于计算机断层扫描（CT）放射学的诊断模型，通过ML增强，预测LUAD恶性程度并指导手术策略。方法：回顾性分析168例经组织学证实的LUAD手术患者，根据术后病理分为低危组（93例）和中危组（75例）。在所有患者术前CT图像上划定感兴趣区域（ROI），然后提取放射学特征。将患者按7:3的比例随机分配到训练集（n=117）和测试集（n=51）。在训练集中，利用患者的临床特征、放射学语义特征和放射组学特征建立临床放射学模型（CM）和放射组学模型（RM），并计算Rad评分。将Rad评分与临床-放射学特征中的独立危险因素结合后，采用logistic回归（LR）、决策树（DT）、随机森林（RF）、极端梯度增强（XGBoost）、支持向量机（SVM）、k近邻（KNN）和naïve贝叶斯模型（NBM）建立不同的综合模型（COMs）。根据受试者工作特征（ROC）曲线和DeLong检验确定最优模型。最后，利用Shapley加性解释（SHAP）将模型的预测过程可视化。结果：入组168例患者中，男性50例（29.76%），年龄56（49.25,67.00）岁；女性118例（70.24%），年龄56.5（42.00,64.00）岁；结论：我们利用ML和CT放射组学构建并验证了一个强大的综合模型，该模型融合了放射组学、临床和放射学属性，可以精确识别术后病理等级升高的luad。这使得医生在手术前可以根据患者肿瘤的病理情况制定不同的手术方案。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Machine learning algorithms for predicting malignancy grades of lung adenocarcinoma and guiding treatments: CT radiomics-based comparisons.

Background: Lung adenocarcinoma (LUAD) is the most frequently diagnosed subtype of non-small cell lung cancer (NSCLC). Notably, prognosis can vary significantly among LUAD patients with different tumor subtypes. The advent of radiomics and machine learning (ML) technologies enables the development of non-invasive pathology predictive models. We attempted to develop computed tomography (CT) radiomics-based diagnostic models, enhanced by ML, to predict LUAD malignancy grade and guide surgical strategies.

Methods: In this retrospective analysis, a total of 168 surgical patients with histology-confirmed LUAD were divided into low-risk group (n=93) and intermediate-to-high-risk group (n=75) based on postoperative pathology. The region of interest (ROI) was delineated on the preoperative CT images for all patients, followed by the extraction of radiomic features. Patients were randomly allocated to a training set (n=117) and a testing set (n=51) in a 7:3 ratio. Within the training set, clinical-radiological model (CM) and radiomics model (RM) were developed utilizing patients' clinical characteristics, radiological semantic features, and radiomic features, along with the calculation of Rad scores. After the Rad scores were combined with independent risk factors among clinical-radiological features, logistic regression (LR), decision tree (DT), random forest (RF), extreme gradient boosting (XGBoost), support vector machine (SVM), K-nearest neighbors (KNN), and naïve Bayes model (NBM) were employed to create different comprehensive models (COMs). The optimal model was identified based on the receiver operating characteristic (ROC) curves and the DeLong test. Finally, Shapley additive explanations (SHAP) were utilized to visualize the predictive processes of the models.

Results: Among the 168 patients enrolled, there were 50 males (29.76%) aged 56 (49.25, 67.00) years and 118 females (70.24%) aged 56.5 (42.00, 64.00) years; Diameter (P<0.001), and consolidation-to-tumor ratio (CTR) ≥0.5 (P=0.002) were identified as independent risk factors for the malignancy degree of LUAD during CM creation. The CM had an area under the ROC curve (AUC) of 0.909 [95% confidence interval (CI): 0.856-0.962] in the training set and 0.920 (95% CI: 0.846-0.994) in the testing set. The RM, comprising seven radiomic features, achieved an AUC of 0.961 (95% CI: 0.926-0.996) in the training set and 0.957 (95% CI: 0.905-1.000) in the testing set. Among models created using various ML algorithms, the XGBoost model was identified as the optimal model. SHAP visualization revealed the model prediction processes and the values of different features.

Conclusions: We constructed and validated a robust, integrative model leveraging ML and CT radiomics, which amalgamates radiomics, clinical, and radiological attributes to precisely identify LUADs with elevated postoperative pathological grades. This enables doctors to formulate different surgical plans according to the pathology of the patients' tumors before the operation.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Journal of thoracic disease RESPIRATORY SYSTEM-

CiteScore

4.60

自引率

4.00%

发文量

254

期刊介绍： The Journal of Thoracic Disease (JTD, J Thorac Dis, pISSN: 2072-1439; eISSN: 2077-6624) was founded in Dec 2009, and indexed in PubMed in Dec 2011 and Science Citation Index SCI in Feb 2013. It is published quarterly (Dec 2009- Dec 2011), bimonthly (Jan 2012 - Dec 2013), monthly (Jan. 2014-) and openly distributed worldwide. JTD received its impact factor of 2.365 for the year 2016. JTD publishes manuscripts that describe new findings and provide current, practical information on the diagnosis and treatment of conditions related to thoracic disease. All the submission and reviewing are conducted electronically so that rapid review is assured.