{"title":"A three-classification machine learning model for non-invasive prediction of molecular subtypes in diffuse glioma: a two-center study.","authors":"Meilin Zhu, Weishu Hou, Jiahao Gao, Fang Han, Shanshan Huang, Xiaohu Li, Longlin Yin, Jiawen Zhang","doi":"10.21037/qims-24-2461","DOIUrl":null,"url":null,"abstract":"<p><strong>Background: </strong>Determining the molecular status of gliomas is crucial for evaluating treatment efficacy and prognosis. However, this process currently requires the invasive and cumbersome method of histological analysis. We aimed to develop and validate a non-invasive three-classification machine learning (ML) model to predict the three molecular subtypes of adult-type diffuse gliomas according to the 2021 World Health Organization classification of tumors of the central nervous system 5<sup>th</sup> edition (WHO CNS 5).</p><p><strong>Methods: </strong>This retrospective study included a total of 306 glioma patients, among whom 258 were from Center 1 (Huashan Hospital; 180 for the training and 78 for the internal validation set) and 48 were from Center 2 (The First Affiliated Hospital of Anhui Medical University; external validation set). Conventional magnetic resonance imaging (MRI) features of tumors were assessed, and the radiomics and Swin Transformer-based deep learning (RSTD) features were respectively extracted from tumor segmentation on axial three-dimensional contrast-enhanced T1-weighted (3D T1C) and T2-fluid-attenuated inversion recovery (T2-FLAIR) sequences. Three types of prediction models: conventional MRI (CM) model, RSTD model, and combined model were respectively trained using six ML classifiers [k-nearest neighbor (kNN), light gradient-boosting machine (LightGBM), random forest (RF), support vector machine (SVM), stochastic gradient descent (SGD), and extreme gradient boosting (XGBoost)] to identify the three major molecular subtypes of adult-type diffuse gliomas. The performance of the models was evaluated using the area under the receiver operating characteristic (ROC) curve (AUC), sensitivity, specificity, accuracy, precision, and F1-score.</p><p><strong>Results: </strong>XGBoost classifier was chosen as our algorithm for model construction due to its superior performance in the training and internal validation cohorts. The combined model, which incorporates CM features, RSTD features, as well as demographic features, achieved best performance in the internal [micro-AUC (0.905) and macro-AUC (0.878)] and external validation sets [micro-AUC (0.911) and macro-AUC (0.891)]. The SHapley Additive explanations (SHAP) and gradient-weighted class activation mapping (Grad-CAM) were used to explain the model.</p><p><strong>Conclusions: </strong>Our study constructed a three-classification ML model that combined CM features, RSTD features, and demographic characteristics, achieved promising performance in predicting molecular subtypes of diffuse glioma. The combined model provided a non-invasive, timely, and accurate diagnostic approach prior to patient treatment to assist clinical decision-making.</p>","PeriodicalId":54267,"journal":{"name":"Quantitative Imaging in Medicine and Surgery","volume":"15 6","pages":"5752-5768"},"PeriodicalIF":2.9000,"publicationDate":"2025-06-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12209611/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Quantitative Imaging in Medicine and Surgery","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.21037/qims-24-2461","RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2025/5/29 0:00:00","PubModel":"Epub","JCR":"Q2","JCRName":"RADIOLOGY, NUCLEAR MEDICINE & MEDICAL IMAGING","Score":null,"Total":0}
引用次数: 0
Abstract
Background: Determining the molecular status of gliomas is crucial for evaluating treatment efficacy and prognosis. However, this process currently requires the invasive and cumbersome method of histological analysis. We aimed to develop and validate a non-invasive three-classification machine learning (ML) model to predict the three molecular subtypes of adult-type diffuse gliomas according to the 2021 World Health Organization classification of tumors of the central nervous system 5th edition (WHO CNS 5).
Methods: This retrospective study included a total of 306 glioma patients, among whom 258 were from Center 1 (Huashan Hospital; 180 for the training and 78 for the internal validation set) and 48 were from Center 2 (The First Affiliated Hospital of Anhui Medical University; external validation set). Conventional magnetic resonance imaging (MRI) features of tumors were assessed, and the radiomics and Swin Transformer-based deep learning (RSTD) features were respectively extracted from tumor segmentation on axial three-dimensional contrast-enhanced T1-weighted (3D T1C) and T2-fluid-attenuated inversion recovery (T2-FLAIR) sequences. Three types of prediction models: conventional MRI (CM) model, RSTD model, and combined model were respectively trained using six ML classifiers [k-nearest neighbor (kNN), light gradient-boosting machine (LightGBM), random forest (RF), support vector machine (SVM), stochastic gradient descent (SGD), and extreme gradient boosting (XGBoost)] to identify the three major molecular subtypes of adult-type diffuse gliomas. The performance of the models was evaluated using the area under the receiver operating characteristic (ROC) curve (AUC), sensitivity, specificity, accuracy, precision, and F1-score.
Results: XGBoost classifier was chosen as our algorithm for model construction due to its superior performance in the training and internal validation cohorts. The combined model, which incorporates CM features, RSTD features, as well as demographic features, achieved best performance in the internal [micro-AUC (0.905) and macro-AUC (0.878)] and external validation sets [micro-AUC (0.911) and macro-AUC (0.891)]. The SHapley Additive explanations (SHAP) and gradient-weighted class activation mapping (Grad-CAM) were used to explain the model.
Conclusions: Our study constructed a three-classification ML model that combined CM features, RSTD features, and demographic characteristics, achieved promising performance in predicting molecular subtypes of diffuse glioma. The combined model provided a non-invasive, timely, and accurate diagnostic approach prior to patient treatment to assist clinical decision-making.