Comparative Evaluation of Deep Learning and Foundation Model Embeddings for Osteoarthritis Feature Classification in Knee Radiographs.

Mohammadreza Chavoshi, Hari Trivedi, Janice Newsome, Aawez Mansuri, Frank Li, Theo Dapamede, Bardia Khosravi, Judy Gichoya
{"title":"Comparative Evaluation of Deep Learning and Foundation Model Embeddings for Osteoarthritis Feature Classification in Knee Radiographs.","authors":"Mohammadreza Chavoshi, Hari Trivedi, Janice Newsome, Aawez Mansuri, Frank Li, Theo Dapamede, Bardia Khosravi, Judy Gichoya","doi":"10.1007/s10278-025-01636-x","DOIUrl":null,"url":null,"abstract":"<p><p>Foundation models (FM) offer a promising alternative to supervised deep learning (DL) by enabling greater flexibility and generalizability without relying on large, labeled datasets. This study investigates the performance of supervised DL models and pre-trained FM embeddings in classifying radiographic features related to knee osteoarthritis. We analyzed 44,985 knee radiographs from the Osteoarthritis Initiative dataset. Two convolutional neural network models (ResNet18 and ConvNeXt-Small) were trained to classify osteophytes, joint space narrowing, subchondral sclerosis, and Kellgren-Lawrence grades (KLG). These models were compared against two FM: BiomedCLIP, a multimodal vision-language model pre-trained on diverse medical images and text, and RAD-DINO vision transformer model pre-trained exclusively on chest radiographs. We extracted image embeddings from both FMs and used XGBoost classifiers to perform downstream classification. Performance was assessed using a comprehensive classification metrics appropriate for binary and multi-class classification tasks. DL models outperformed FM-based approaches across all tasks. ConvNeXt achieved the highest performance in predicting KLG, with a weighted Cohen's kappa of 0.880 and higher AUC in binary tasks. BiomedCLIP and RAD-DINO performed similarly, and BiomedCLIP's prior exposure to knee radiographs during pretraining led to only slight improvements. Zero-shot classification using BiomedCLIP correctly identified 91.14% of knee radiographs, with most failures associated with low image quality. Grad-CAM visualizations revealed DL models, particularly ConvNeXt, reliably focused on clinically relevant regions. While FMs offer promising utility in auxiliary imaging tasks, supervised DL remains superior for fine-grained radiographic feature classification in domains with limited pretraining representation, such as musculoskeletal imaging.</p>","PeriodicalId":516858,"journal":{"name":"Journal of imaging informatics in medicine","volume":" ","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2025-09-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of imaging informatics in medicine","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1007/s10278-025-01636-x","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

Foundation models (FM) offer a promising alternative to supervised deep learning (DL) by enabling greater flexibility and generalizability without relying on large, labeled datasets. This study investigates the performance of supervised DL models and pre-trained FM embeddings in classifying radiographic features related to knee osteoarthritis. We analyzed 44,985 knee radiographs from the Osteoarthritis Initiative dataset. Two convolutional neural network models (ResNet18 and ConvNeXt-Small) were trained to classify osteophytes, joint space narrowing, subchondral sclerosis, and Kellgren-Lawrence grades (KLG). These models were compared against two FM: BiomedCLIP, a multimodal vision-language model pre-trained on diverse medical images and text, and RAD-DINO vision transformer model pre-trained exclusively on chest radiographs. We extracted image embeddings from both FMs and used XGBoost classifiers to perform downstream classification. Performance was assessed using a comprehensive classification metrics appropriate for binary and multi-class classification tasks. DL models outperformed FM-based approaches across all tasks. ConvNeXt achieved the highest performance in predicting KLG, with a weighted Cohen's kappa of 0.880 and higher AUC in binary tasks. BiomedCLIP and RAD-DINO performed similarly, and BiomedCLIP's prior exposure to knee radiographs during pretraining led to only slight improvements. Zero-shot classification using BiomedCLIP correctly identified 91.14% of knee radiographs, with most failures associated with low image quality. Grad-CAM visualizations revealed DL models, particularly ConvNeXt, reliably focused on clinically relevant regions. While FMs offer promising utility in auxiliary imaging tasks, supervised DL remains superior for fine-grained radiographic feature classification in domains with limited pretraining representation, such as musculoskeletal imaging.

深度学习和基础模型嵌入在膝关节x线片骨关节炎特征分类中的比较评价。
基础模型(FM)通过在不依赖大型标记数据集的情况下提供更大的灵活性和泛化性,为监督深度学习(DL)提供了一个有希望的替代方案。本研究探讨了监督DL模型和预训练FM嵌入在分类与膝骨关节炎相关的放射学特征方面的表现。我们分析了来自骨关节炎倡议数据集的44,985张膝关节x线片。两个卷积神经网络模型(ResNet18和ConvNeXt-Small)被训练用于分类骨肿、关节间隙狭窄、软骨下硬化和Kellgren-Lawrence分级(KLG)。将这些模型与两种FM进行比较:BiomedCLIP是一种针对多种医学图像和文本进行预训练的多模态视觉语言模型,而RAD-DINO是专门针对胸片进行预训练的视觉转换模型。我们从两个fm中提取图像嵌入,并使用XGBoost分类器进行下游分类。使用适合二进制和多类分类任务的综合分类指标评估性能。在所有任务中,深度学习模型都优于基于神经网络的方法。ConvNeXt在预测KLG方面取得了最高的表现,加权Cohen’s kappa为0.880,在二元任务中AUC更高。BiomedCLIP和RAD-DINO表现相似,并且在预训练期间,BiomedCLIP预先暴露于膝关节x线片仅导致轻微改善。使用生物医学clip进行零射击分类正确识别了91.14%的膝关节x线片,其中大多数失败与低图像质量有关。Grad-CAM可视化显示DL模型,特别是ConvNeXt,可靠地集中在临床相关区域。虽然FMs在辅助成像任务中有很好的应用前景,但在预训练表征有限的领域(如肌肉骨骼成像),监督DL在细粒度放射学特征分类方面仍然优越。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信