Comparative Evaluation of Deep Learning and Foundation Model Embeddings for Osteoarthritis Feature Classification in Knee Radiographs.

Journal of imaging informatics in medicine Pub Date : 2025-09-02 DOI:10.1007/s10278-025-01636-x

Mohammadreza Chavoshi, Hari Trivedi, Janice Newsome, Aawez Mansuri, Frank Li, Theo Dapamede, Bardia Khosravi, Judy Gichoya

{"title":"Comparative Evaluation of Deep Learning and Foundation Model Embeddings for Osteoarthritis Feature Classification in Knee Radiographs.","authors":"Mohammadreza Chavoshi, Hari Trivedi, Janice Newsome, Aawez Mansuri, Frank Li, Theo Dapamede, Bardia Khosravi, Judy Gichoya","doi":"10.1007/s10278-025-01636-x","DOIUrl":null,"url":null,"abstract":"<p><p>Foundation models (FM) offer a promising alternative to supervised deep learning (DL) by enabling greater flexibility and generalizability without relying on large, labeled datasets. This study investigates the performance of supervised DL models and pre-trained FM embeddings in classifying radiographic features related to knee osteoarthritis. We analyzed 44,985 knee radiographs from the Osteoarthritis Initiative dataset. Two convolutional neural network models (ResNet18 and ConvNeXt-Small) were trained to classify osteophytes, joint space narrowing, subchondral sclerosis, and Kellgren-Lawrence grades (KLG). These models were compared against two FM: BiomedCLIP, a multimodal vision-language model pre-trained on diverse medical images and text, and RAD-DINO vision transformer model pre-trained exclusively on chest radiographs. We extracted image embeddings from both FMs and used XGBoost classifiers to perform downstream classification. Performance was assessed using a comprehensive classification metrics appropriate for binary and multi-class classification tasks. DL models outperformed FM-based approaches across all tasks. ConvNeXt achieved the highest performance in predicting KLG, with a weighted Cohen's kappa of 0.880 and higher AUC in binary tasks. BiomedCLIP and RAD-DINO performed similarly, and BiomedCLIP's prior exposure to knee radiographs during pretraining led to only slight improvements. Zero-shot classification using BiomedCLIP correctly identified 91.14% of knee radiographs, with most failures associated with low image quality. Grad-CAM visualizations revealed DL models, particularly ConvNeXt, reliably focused on clinically relevant regions. While FMs offer promising utility in auxiliary imaging tasks, supervised DL remains superior for fine-grained radiographic feature classification in domains with limited pretraining representation, such as musculoskeletal imaging.</p>","PeriodicalId":516858,"journal":{"name":"Journal of imaging informatics in medicine","volume":" ","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2025-09-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of imaging informatics in medicine","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1007/s10278-025-01636-x","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

Foundation models (FM) offer a promising alternative to supervised deep learning (DL) by enabling greater flexibility and generalizability without relying on large, labeled datasets. This study investigates the performance of supervised DL models and pre-trained FM embeddings in classifying radiographic features related to knee osteoarthritis. We analyzed 44,985 knee radiographs from the Osteoarthritis Initiative dataset. Two convolutional neural network models (ResNet18 and ConvNeXt-Small) were trained to classify osteophytes, joint space narrowing, subchondral sclerosis, and Kellgren-Lawrence grades (KLG). These models were compared against two FM: BiomedCLIP, a multimodal vision-language model pre-trained on diverse medical images and text, and RAD-DINO vision transformer model pre-trained exclusively on chest radiographs. We extracted image embeddings from both FMs and used XGBoost classifiers to perform downstream classification. Performance was assessed using a comprehensive classification metrics appropriate for binary and multi-class classification tasks. DL models outperformed FM-based approaches across all tasks. ConvNeXt achieved the highest performance in predicting KLG, with a weighted Cohen's kappa of 0.880 and higher AUC in binary tasks. BiomedCLIP and RAD-DINO performed similarly, and BiomedCLIP's prior exposure to knee radiographs during pretraining led to only slight improvements. Zero-shot classification using BiomedCLIP correctly identified 91.14% of knee radiographs, with most failures associated with low image quality. Grad-CAM visualizations revealed DL models, particularly ConvNeXt, reliably focused on clinically relevant regions. While FMs offer promising utility in auxiliary imaging tasks, supervised DL remains superior for fine-grained radiographic feature classification in domains with limited pretraining representation, such as musculoskeletal imaging.

查看原文本刊更多论文

深度学习和基础模型嵌入在膝关节x线片骨关节炎特征分类中的比较评价。

基础模型（FM）通过在不依赖大型标记数据集的情况下提供更大的灵活性和泛化性，为监督深度学习（DL）提供了一个有希望的替代方案。本研究探讨了监督DL模型和预训练FM嵌入在分类与膝骨关节炎相关的放射学特征方面的表现。我们分析了来自骨关节炎倡议数据集的44,985张膝关节x线片。两个卷积神经网络模型（ResNet18和ConvNeXt-Small）被训练用于分类骨肿、关节间隙狭窄、软骨下硬化和Kellgren-Lawrence分级（KLG）。将这些模型与两种FM进行比较：BiomedCLIP是一种针对多种医学图像和文本进行预训练的多模态视觉语言模型，而RAD-DINO是专门针对胸片进行预训练的视觉转换模型。我们从两个fm中提取图像嵌入，并使用XGBoost分类器进行下游分类。使用适合二进制和多类分类任务的综合分类指标评估性能。在所有任务中，深度学习模型都优于基于神经网络的方法。ConvNeXt在预测KLG方面取得了最高的表现，加权Cohen’s kappa为0.880，在二元任务中AUC更高。BiomedCLIP和RAD-DINO表现相似，并且在预训练期间，BiomedCLIP预先暴露于膝关节x线片仅导致轻微改善。使用生物医学clip进行零射击分类正确识别了91.14%的膝关节x线片，其中大多数失败与低图像质量有关。Grad-CAM可视化显示DL模型，特别是ConvNeXt，可靠地集中在临床相关区域。虽然FMs在辅助成像任务中有很好的应用前景，但在预训练表征有限的领域（如肌肉骨骼成像），监督DL在细粒度放射学特征分类方面仍然优越。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Journal of imaging informatics in medicine

自引率

0.00%

发文量