{"title":"Machine learning-assisted classification of lung cancer: the role of sarcopenia, inflammatory biomarkers, and PET/CT anatomical-metabolic parameters.","authors":"Handan Tanyildizi-Kokkulunk, Goksel Alcin, Iffet Cavdar, Resit Akyel, Safak Yigit, Tuba Ciftci-Kusbeci, Gonul Caliskan","doi":"10.1007/s13246-025-01650-x","DOIUrl":null,"url":null,"abstract":"<p><p>Accurate differentiation between non-cancerous, benign, and malignant lung cancer remains a diagnostic challenge due to overlapping clinical and imaging characteristics. This study proposes a multimodal machine learning (ML) framework integrating positron emission tomography/computed tomography (PET/CT) anatomic-metabolic parameters, sarcopenia markers, and inflammatory biomarkers to enhance classification performance in lung cancer. A retrospective dataset of 222 patients was analyzed, including demographic variables, functional and morphometric sarcopenia indices, hematological inflammation markers, and PET/CT derived parameters such as maximum and mean standardized uptake value (SUVmax, SUVmean), metabolic tumor volume (MTV), total lesion glycolysis (TLG). Five ML algorithms-Logistic Regression, Multi-Layer Perceptron, Support Vector Machine, Extreme Gradient Boosting, and Random Forest-were evaluated using standardized performance metrics. Synthetic Minority Oversampling Technique was applied to balance class distributions. Feature importance analysis was conducted using the optimal model, and classification was repeated using the top 15 features. Among the models, Random Forest demonstrated superior predictive performance with a test accuracy of 96%, precision, recall, and F1-score of 0.96, and an average AUC of 0.99. Feature importance analysis revealed SUVmax, SUVmean, total lesion glycolysis, and skeletal muscle index as leading predictors. A secondary classification using only the top 15 features yielded even higher test accuracy (97%). These findings underscore the potential of integrating metabolic imaging, physical function, and biochemical inflammation markers in a non-invasive ML-based diagnostic pipeline. The proposed framework demonstrates high accuracy and generalizability and may serve as an effective clinical decision support tool in early lung cancer diagnosis and risk stratification.</p>","PeriodicalId":48490,"journal":{"name":"Physical and Engineering Sciences in Medicine","volume":" ","pages":""},"PeriodicalIF":2.0000,"publicationDate":"2025-10-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Physical and Engineering Sciences in Medicine","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1007/s13246-025-01650-x","RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"ENGINEERING, BIOMEDICAL","Score":null,"Total":0}
引用次数: 0
Abstract
Accurate differentiation between non-cancerous, benign, and malignant lung cancer remains a diagnostic challenge due to overlapping clinical and imaging characteristics. This study proposes a multimodal machine learning (ML) framework integrating positron emission tomography/computed tomography (PET/CT) anatomic-metabolic parameters, sarcopenia markers, and inflammatory biomarkers to enhance classification performance in lung cancer. A retrospective dataset of 222 patients was analyzed, including demographic variables, functional and morphometric sarcopenia indices, hematological inflammation markers, and PET/CT derived parameters such as maximum and mean standardized uptake value (SUVmax, SUVmean), metabolic tumor volume (MTV), total lesion glycolysis (TLG). Five ML algorithms-Logistic Regression, Multi-Layer Perceptron, Support Vector Machine, Extreme Gradient Boosting, and Random Forest-were evaluated using standardized performance metrics. Synthetic Minority Oversampling Technique was applied to balance class distributions. Feature importance analysis was conducted using the optimal model, and classification was repeated using the top 15 features. Among the models, Random Forest demonstrated superior predictive performance with a test accuracy of 96%, precision, recall, and F1-score of 0.96, and an average AUC of 0.99. Feature importance analysis revealed SUVmax, SUVmean, total lesion glycolysis, and skeletal muscle index as leading predictors. A secondary classification using only the top 15 features yielded even higher test accuracy (97%). These findings underscore the potential of integrating metabolic imaging, physical function, and biochemical inflammation markers in a non-invasive ML-based diagnostic pipeline. The proposed framework demonstrates high accuracy and generalizability and may serve as an effective clinical decision support tool in early lung cancer diagnosis and risk stratification.