Trade-Off Analysis of Classical Machine Learning and Deep Learning Models for Robust Brain Tumor Detection: Benchmark Study.

IF 2

JMIR AI Pub Date : 2025-09-15 DOI:10.2196/76344

Yuting Tian

{"title":"Trade-Off Analysis of Classical Machine Learning and Deep Learning Models for Robust Brain Tumor Detection: Benchmark Study.","authors":"Yuting Tian","doi":"10.2196/76344","DOIUrl":null,"url":null,"abstract":"Background: Medical image analysis plays a critical role in brain tumor detection, but training deep learning models often requires large, labeled datasets, which can be time-consuming and costly. This study explores a comparative analysis of machine learning and deep learning models for brain tumor classification, focusing on whether deep learning models are necessary for small medical datasets and whether self-supervised learning can reduce annotation costs.Objective: The primary goal is to evaluate trade-offs between traditional machine learning and deep learning, including self-supervised models under small medical image data. The secondary goal is to assess model robustness, transferability, and generalization through evaluation of unseen data within- and cross-domains.Methods: Four models were compared: (1) support vector machine (SVM) with histogram of oriented gradients (HOG) features, (2) a convolutional neural network based on ResNet18, (3) a transformer-based model using vision transformer (ViT-B/16), and (4) a self-supervised learning approach using Simple Contrastive Learning of Visual Representations (SimCLR). These models were selected to represent diverse paradigms. SVM+HOG represents traditional feature engineering with low computational cost, ResNet18 serves as a well-established convolutional neural network with strong baseline performance, ViT-B/16 leverages self-attention to capture long-range spatial features, and SimCLR enables learning from unlabeled data, potentially reducing annotation costs. The primary dataset consisted of 2870 brain magnetic resonance images across 4 classes: glioma, meningioma, pituitary, and nontumor. All models were trained under consistent settings, including data augmentation, early stopping, and 3 independent runs using the different random seeds to account for performance variability. Performance metrics included accuracy, precision, recall, F1-score, and convergence. To assess robustness and generalization capability, evaluation was performed on unseen test data from both the primary and cross datasets. No retraining or test augmentations were applied to the external data, thereby reflecting realistic deployment conditions. The models demonstrated consistently strong performance in both within-domain and cross-domain evaluations.Results: The results revealed distinct trade-offs; ResNet18 achieved the highest validation accuracy (mean 99.77%, SD 0.00%) and the lowest validation loss, along with a weighted test accuracy of 99% within-domain and 95% cross-domain. SimCLR reached a mean validation accuracy of 97.29% (SD 0.86%) and achieved up to 97% weighted test accuracy within-domain and 91% cross-domain, despite requiring 2-stage training phases involving contrastive pretraining followed by linear evaluation. ViT-B/16 reached a mean validation accuracy of 97.36% (SD 0.11%), with a weighted test accuracy of 98% within-domain and 93% cross-domain. SVM+HOG maintained a competitive validation accuracy of 96.51%, with 97% within-domain test accuracy, though its accuracy dropped to 80% cross-domain.Conclusions: The study reveals meaningful trade-offs between model complexity, annotation requirements, and deployment feasibility-critical factors for selecting models in real-world medical imaging applications.","PeriodicalId":73551,"journal":{"name":"JMIR AI","volume":"4 ","pages":"e76344"},"PeriodicalIF":2.0000,"publicationDate":"2025-09-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12456844/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"JMIR AI","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.2196/76344","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

Background: Medical image analysis plays a critical role in brain tumor detection, but training deep learning models often requires large, labeled datasets, which can be time-consuming and costly. This study explores a comparative analysis of machine learning and deep learning models for brain tumor classification, focusing on whether deep learning models are necessary for small medical datasets and whether self-supervised learning can reduce annotation costs.

Objective: The primary goal is to evaluate trade-offs between traditional machine learning and deep learning, including self-supervised models under small medical image data. The secondary goal is to assess model robustness, transferability, and generalization through evaluation of unseen data within- and cross-domains.

Methods: Four models were compared: (1) support vector machine (SVM) with histogram of oriented gradients (HOG) features, (2) a convolutional neural network based on ResNet18, (3) a transformer-based model using vision transformer (ViT-B/16), and (4) a self-supervised learning approach using Simple Contrastive Learning of Visual Representations (SimCLR). These models were selected to represent diverse paradigms. SVM+HOG represents traditional feature engineering with low computational cost, ResNet18 serves as a well-established convolutional neural network with strong baseline performance, ViT-B/16 leverages self-attention to capture long-range spatial features, and SimCLR enables learning from unlabeled data, potentially reducing annotation costs. The primary dataset consisted of 2870 brain magnetic resonance images across 4 classes: glioma, meningioma, pituitary, and nontumor. All models were trained under consistent settings, including data augmentation, early stopping, and 3 independent runs using the different random seeds to account for performance variability. Performance metrics included accuracy, precision, recall, F₁-score, and convergence. To assess robustness and generalization capability, evaluation was performed on unseen test data from both the primary and cross datasets. No retraining or test augmentations were applied to the external data, thereby reflecting realistic deployment conditions. The models demonstrated consistently strong performance in both within-domain and cross-domain evaluations.

Results: The results revealed distinct trade-offs; ResNet18 achieved the highest validation accuracy (mean 99.77%, SD 0.00%) and the lowest validation loss, along with a weighted test accuracy of 99% within-domain and 95% cross-domain. SimCLR reached a mean validation accuracy of 97.29% (SD 0.86%) and achieved up to 97% weighted test accuracy within-domain and 91% cross-domain, despite requiring 2-stage training phases involving contrastive pretraining followed by linear evaluation. ViT-B/16 reached a mean validation accuracy of 97.36% (SD 0.11%), with a weighted test accuracy of 98% within-domain and 93% cross-domain. SVM+HOG maintained a competitive validation accuracy of 96.51%, with 97% within-domain test accuracy, though its accuracy dropped to 80% cross-domain.

Conclusions: The study reveals meaningful trade-offs between model complexity, annotation requirements, and deployment feasibility-critical factors for selecting models in real-world medical imaging applications.

Abstract Image

查看原文本刊更多论文

经典机器学习和深度学习模型在稳健脑肿瘤检测中的权衡分析：基准研究。

背景：医学图像分析在脑肿瘤检测中起着至关重要的作用，但训练深度学习模型通常需要大量的标记数据集，这可能既耗时又昂贵。本研究对机器学习和深度学习模型进行脑肿瘤分类的比较分析，重点关注深度学习模型对于小型医疗数据集是否必要，以及自监督学习是否可以降低标注成本。目的：主要目标是评估传统机器学习和深度学习之间的权衡，包括小医学图像数据下的自监督模型。第二个目标是评估模型的鲁棒性，可转移性，并通过评估内部和跨领域看不见的数据泛化。方法：对四种模型进行比较：(1)基于定向梯度直方图（HOG）特征的支持向量机（SVM）模型、(2)基于ResNet18的卷积神经网络模型、(3)基于视觉转换器的基于变压器的模型（ViT-B/16）和(4)基于视觉表征简单对比学习（SimCLR）的自监督学习方法。选择这些模型是为了代表不同的范式。SVM+HOG代表了计算成本低的传统特征工程，ResNet18是一个成熟的卷积神经网络，具有较强的基线性能，viti - b /16利用自关注捕获远程空间特征，SimCLR可以从未标记的数据中学习，可能降低标注成本。主要数据集包括2870张脑磁共振图像，分为4类：胶质瘤、脑膜瘤、垂体和非肿瘤。所有模型都在一致的设置下进行训练，包括数据增强、提前停止和3次独立运行，使用不同的随机种子来解释性能的可变性。性能指标包括准确性、精密度、召回率、f1分数和收敛性。为了评估鲁棒性和泛化能力，对来自主数据集和交叉数据集的未见测试数据进行了评估。没有对外部数据应用再培训或测试扩展，因此反映了实际的部署条件。这些模型在域内和跨域评估中都表现出一致的强大性能。结果：结果揭示了明显的权衡；ResNet18的验证准确率最高（平均99.77%，标准差0.00%），验证损失最低，域内加权测试准确率为99%，跨域加权测试准确率为95%。SimCLR的平均验证准确率达到97.29% (SD 0.86%)，域内加权测试准确率达到97%，跨域测试准确率达到91%，尽管需要两个阶段的训练阶段，包括对比预训练和线性评估。ViT-B/16平均验证准确率为97.36% (SD 0.11%)，域内加权检验准确率为98%，跨域加权检验准确率为93%。SVM+HOG的竞争验证准确率为96.51%，域内测试准确率为97%，跨域测试准确率下降至80%。结论：该研究揭示了模型复杂性、注释要求和部署可行性之间有意义的权衡，这是在现实世界的医学成像应用中选择模型的关键因素。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

JMIR AI

自引率

0.00%

发文量