{"title":"Machine Learning Model for Predicting Pathological Invasiveness of Pulmonary Ground-Glass Nodules Based on AI-Extracted Radiomic Features.","authors":"Guozhen Yang, Yuanheng Huang, Huiguo Chen, Weibin Wu, Yonghui Wu, Kai Zhang, Xiaojun Li, Jiannan Xu, Jian Zhang","doi":"10.1111/1759-7714.70128","DOIUrl":null,"url":null,"abstract":"<p><strong>Background: </strong>With the widespread adoption of low-dose CT screening, the detection of pulmonary ground-glass nodules (GGNs) has risen markedly, presenting diagnostic challenges in distinguishing preinvasive lesions from invasive adenocarcinomas (IAC). This study aimed to develop a machine learning (ML)-based model using artificial intelligence (AI)-extracted CT radiomic features to predict the invasiveness of GGNs.</p><p><strong>Methods: </strong>A retrospective cohort of 285 patients (148 with preinvasive lesions, 137 with IAC) from the Lingnan Campus was divided into training and internal validation sets (8:2). An independent cohort of 210 patients (118 with preinvasive lesions, 92 with IAC) from the Tianhe Campus served as external validation. Nineteen radiomic features were extracted and filtered using Boruta and LASSO algorithms. Seven ML classifiers were evaluated using AUC-ROC, decision curve analysis (DCA), and SHAP interpretability.</p><p><strong>Results: </strong>Median CT value, skewness, 3D long-axis diameter, and transverse diameter were ultimately selected for model construction. Among all classifiers, the Gradient Boosting Machine (GBM) model achieved the best performance (AUC = 0.965 training, 0.908 internal validation, and 0.965 external validation). It demonstrated strong accuracy (88.1%), specificity (80.7%), and F1 score (0.87) in the external validation cohort. The GBM model demonstrated superior net clinical benefit. SHAP analysis identified median CT value and skewness as the most influential predictors.</p><p><strong>Conclusion: </strong>This study presents a simplified ML model using AI-extracted radiomic features, which has strong predictive performance and biological interpretability for preoperative risk stratification of GGNs. By leveraging median CT value, skewness, 3D long-axis diameter, and transverse diameter, the model enables accurate and noninvasive differentiation between IAC and indolent lesions, supporting precise surgical planning.</p>","PeriodicalId":23338,"journal":{"name":"Thoracic Cancer","volume":"16 15","pages":"e70128"},"PeriodicalIF":2.3000,"publicationDate":"2025-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12313823/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Thoracic Cancer","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1111/1759-7714.70128","RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"ONCOLOGY","Score":null,"Total":0}
引用次数: 0
Abstract
Background: With the widespread adoption of low-dose CT screening, the detection of pulmonary ground-glass nodules (GGNs) has risen markedly, presenting diagnostic challenges in distinguishing preinvasive lesions from invasive adenocarcinomas (IAC). This study aimed to develop a machine learning (ML)-based model using artificial intelligence (AI)-extracted CT radiomic features to predict the invasiveness of GGNs.
Methods: A retrospective cohort of 285 patients (148 with preinvasive lesions, 137 with IAC) from the Lingnan Campus was divided into training and internal validation sets (8:2). An independent cohort of 210 patients (118 with preinvasive lesions, 92 with IAC) from the Tianhe Campus served as external validation. Nineteen radiomic features were extracted and filtered using Boruta and LASSO algorithms. Seven ML classifiers were evaluated using AUC-ROC, decision curve analysis (DCA), and SHAP interpretability.
Results: Median CT value, skewness, 3D long-axis diameter, and transverse diameter were ultimately selected for model construction. Among all classifiers, the Gradient Boosting Machine (GBM) model achieved the best performance (AUC = 0.965 training, 0.908 internal validation, and 0.965 external validation). It demonstrated strong accuracy (88.1%), specificity (80.7%), and F1 score (0.87) in the external validation cohort. The GBM model demonstrated superior net clinical benefit. SHAP analysis identified median CT value and skewness as the most influential predictors.
Conclusion: This study presents a simplified ML model using AI-extracted radiomic features, which has strong predictive performance and biological interpretability for preoperative risk stratification of GGNs. By leveraging median CT value, skewness, 3D long-axis diameter, and transverse diameter, the model enables accurate and noninvasive differentiation between IAC and indolent lesions, supporting precise surgical planning.
期刊介绍:
Thoracic Cancer aims to facilitate international collaboration and exchange of comprehensive and cutting-edge information on basic, translational, and applied clinical research in lung cancer, esophageal cancer, mediastinal cancer, breast cancer and other thoracic malignancies. Prevention, treatment and research relevant to Asia-Pacific is a focus area, but submissions from all regions are welcomed. The editors encourage contributions relevant to prevention, general thoracic surgery, medical oncology, radiology, radiation medicine, pathology, basic cancer research, as well as epidemiological and translational studies in thoracic cancer. Thoracic Cancer is the official publication of the Chinese Society of Lung Cancer, International Chinese Society of Thoracic Surgery and is endorsed by the Korean Association for the Study of Lung Cancer and the Hong Kong Cancer Therapy Society.
The Journal publishes a range of article types including: Editorials, Invited Reviews, Mini Reviews, Original Articles, Clinical Guidelines, Technological Notes, Imaging in thoracic cancer, Meeting Reports, Case Reports, Letters to the Editor, Commentaries, and Brief Reports.