Performance Comparison of Machine Learning Using Radiomic Features and CNN-Based Deep Learning in Benign and Malignant Classification of Vertebral Compression Fractures Using CT Scans.

Journal of imaging informatics in medicine Pub Date : 2025-06-02 DOI:10.1007/s10278-025-01553-z

Jong Chan Yeom, So Hyun Park, Young Jae Kim, Tae Ran Ahn, Kwang Gi Kim

{"title":"Performance Comparison of Machine Learning Using Radiomic Features and CNN-Based Deep Learning in Benign and Malignant Classification of Vertebral Compression Fractures Using CT Scans.","authors":"Jong Chan Yeom, So Hyun Park, Young Jae Kim, Tae Ran Ahn, Kwang Gi Kim","doi":"10.1007/s10278-025-01553-z","DOIUrl":null,"url":null,"abstract":"<p><p>Distinguishing benign from malignant vertebral compression fractures is critical for clinical management but remains challenging on contrast-enhanced abdominal CT, which lacks the soft tissue contrast of MRI. This study evaluates and compares radiomic feature-based machine learning and convolutional neural network-based deep learning models for classifying VCFs using abdominal CT. A retrospective cohort of 447 vertebral compression fractures (196 benign, 251 malignant) from 286 patients was analyzed. Radiomic features were extracted using PyRadiomics, with Recursive Feature Elimination selecting six key texture-based features (e.g., Run Variance, Dependence Non-Uniformity Normalized), highlighting textural heterogeneity as a malignancy marker. Machine learning models (XGBoost, SVM, KNN, Random Forest) and a 3D CNN were trained on CT data, with performance assessed via precision, recall, F1 score, accuracy, and AUC. The deep learning model achieved marginally superior overall performance, with a statistically significant higher AUC (77.66% vs. 75.91%, p < 0.05) and better precision, F1 score, and accuracy compared to the top-performing machine learning model (XGBoost). Deep learning's attention maps localized diagnostically relevant regions, mimicking radiologists' focus, whereas radiomics lacked spatial interpretability despite offering quantifiable biomarkers. This study underscores the complementary strengths of machine learning and deep learning: radiomics provides interpretable features tied to tumor heterogeneity, while DL autonomously extracts high-dimensional patterns with spatial explainability. Integrating both approaches could enhance diagnostic accuracy and clinician trust in abdominal CT-based VCF assessment. Limitations include retrospective single-center data and potential selection bias. Future multi-center studies with diverse protocols and histopathological validation are warranted to generalize these findings.</p>","PeriodicalId":516858,"journal":{"name":"Journal of imaging informatics in medicine","volume":" ","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2025-06-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of imaging informatics in medicine","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1007/s10278-025-01553-z","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

Distinguishing benign from malignant vertebral compression fractures is critical for clinical management but remains challenging on contrast-enhanced abdominal CT, which lacks the soft tissue contrast of MRI. This study evaluates and compares radiomic feature-based machine learning and convolutional neural network-based deep learning models for classifying VCFs using abdominal CT. A retrospective cohort of 447 vertebral compression fractures (196 benign, 251 malignant) from 286 patients was analyzed. Radiomic features were extracted using PyRadiomics, with Recursive Feature Elimination selecting six key texture-based features (e.g., Run Variance, Dependence Non-Uniformity Normalized), highlighting textural heterogeneity as a malignancy marker. Machine learning models (XGBoost, SVM, KNN, Random Forest) and a 3D CNN were trained on CT data, with performance assessed via precision, recall, F1 score, accuracy, and AUC. The deep learning model achieved marginally superior overall performance, with a statistically significant higher AUC (77.66% vs. 75.91%, p < 0.05) and better precision, F1 score, and accuracy compared to the top-performing machine learning model (XGBoost). Deep learning's attention maps localized diagnostically relevant regions, mimicking radiologists' focus, whereas radiomics lacked spatial interpretability despite offering quantifiable biomarkers. This study underscores the complementary strengths of machine learning and deep learning: radiomics provides interpretable features tied to tumor heterogeneity, while DL autonomously extracts high-dimensional patterns with spatial explainability. Integrating both approaches could enhance diagnostic accuracy and clinician trust in abdominal CT-based VCF assessment. Limitations include retrospective single-center data and potential selection bias. Future multi-center studies with diverse protocols and histopathological validation are warranted to generalize these findings.

查看原文本刊更多论文

基于放射学特征的机器学习与基于cnn的深度学习在CT扫描椎体压缩性骨折良恶性分类中的性能比较

区分椎体压缩性骨折的良恶性对临床治疗至关重要，但由于缺乏MRI的软组织对比，在增强腹部CT上仍然具有挑战性。本研究评估并比较了基于放射学特征的机器学习和基于卷积神经网络的深度学习模型在腹部CT vcf分类中的应用。回顾性分析286例患者的447例椎体压缩性骨折（196例为良性，251例为恶性）。使用PyRadiomics提取放射组学特征，递归特征消除选择六个关键的基于纹理的特征（例如，运行方差，依赖性非均匀性归一化），突出纹理异质性作为恶性标记。机器学习模型（XGBoost， SVM， KNN, Random Forest）和3D CNN在CT数据上进行训练，并通过精度，召回率，F1分数，准确度和AUC来评估性能。深度学习模型取得了略微优越的整体性能，具有统计学上显着更高的AUC (77.66% vs. 75.91%, p

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Journal of imaging informatics in medicine

自引率

0.00%

发文量