Rethinking Domain-Specific Pretraining by Supervised or Self-Supervised Learning for Chest Radiograph Classification: A Comparative Study Against ImageNet Counterparts in Cold-Start Active Learning

Han Yuan, Mingcheng Zhu, Rui Yang, Han Liu, Irene Li, Chuan Hong
{"title":"Rethinking Domain-Specific Pretraining by Supervised or Self-Supervised Learning for Chest Radiograph Classification: A Comparative Study Against ImageNet Counterparts in Cold-Start Active Learning","authors":"Han Yuan,&nbsp;Mingcheng Zhu,&nbsp;Rui Yang,&nbsp;Han Liu,&nbsp;Irene Li,&nbsp;Chuan Hong","doi":"10.1002/hcs2.70009","DOIUrl":null,"url":null,"abstract":"<div>\n \n \n <section>\n \n <h3> Objective</h3>\n \n <p>Deep learning (DL) has become the prevailing method in chest radiograph analysis, yet its performance heavily depends on large quantities of annotated images. To mitigate the cost, cold-start active learning (AL), comprising an initialization followed by subsequent learning, selects a small subset of informative data points for labeling. Recent advancements in pretrained models by supervised or self-supervised learning tailored to chest radiograph have shown broad applicability to diverse downstream tasks. However, their potential in cold-start AL remains unexplored.</p>\n </section>\n \n <section>\n \n <h3> Methods</h3>\n \n <p>To validate the efficacy of domain-specific pretraining, we compared two foundation models: supervised TXRV and self-supervised REMEDIS with their general domain counterparts pretrained on ImageNet. Model performance was evaluated at both initialization and subsequent learning stages on two diagnostic tasks: psychiatric pneumonia and COVID-19. For initialization, we assessed their integration with three strategies: diversity, uncertainty, and hybrid sampling. For subsequent learning, we focused on uncertainty sampling powered by different pretrained models. We also conducted statistical tests to compare the foundation models with ImageNet counterparts, investigate the relationship between initialization and subsequent learning, examine the performance of one-shot initialization against the full AL process, and investigate the influence of class balance in initialization samples on initialization and subsequent learning.</p>\n </section>\n \n <section>\n \n <h3> Results</h3>\n \n <p>First, domain-specific foundation models failed to outperform ImageNet counterparts in six out of eight experiments on informative sample selection. Both domain-specific and general pretrained models were unable to generate representations that could substitute for the original images as model inputs in seven of the eight scenarios. However, pretrained model-based initialization surpassed random sampling, the default approach in cold-start AL. Second, initialization performance was positively correlated with subsequent learning performance, highlighting the importance of initialization strategies. Third, one-shot initialization performed comparably to the full AL process, demonstrating the potential of reducing experts' repeated waiting during AL iterations. Last, a U-shaped correlation was observed between the class balance of initialization samples and model performance, suggesting that the class balance is more strongly associated with performance at middle budget levels than at low or high budgets.</p>\n </section>\n \n <section>\n \n <h3> Conclusions</h3>\n \n <p>In this study, we highlighted the limitations of medical pretraining compared to general pretraining in the context of cold-start AL. We also identified promising outcomes related to cold-start AL, including initialization based on pretrained models, the positive influence of initialization on subsequent learning, the potential for one-shot initialization, and the influence of class balance on middle-budget AL. Researchers are encouraged to improve medical pretraining for versatile DL foundations and explore novel AL methods.</p>\n </section>\n </div>","PeriodicalId":100601,"journal":{"name":"Health Care Science","volume":"4 2","pages":"110-143"},"PeriodicalIF":0.0000,"publicationDate":"2025-04-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1002/hcs2.70009","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Health Care Science","FirstCategoryId":"1085","ListUrlMain":"https://onlinelibrary.wiley.com/doi/10.1002/hcs2.70009","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

Objective

Deep learning (DL) has become the prevailing method in chest radiograph analysis, yet its performance heavily depends on large quantities of annotated images. To mitigate the cost, cold-start active learning (AL), comprising an initialization followed by subsequent learning, selects a small subset of informative data points for labeling. Recent advancements in pretrained models by supervised or self-supervised learning tailored to chest radiograph have shown broad applicability to diverse downstream tasks. However, their potential in cold-start AL remains unexplored.

Methods

To validate the efficacy of domain-specific pretraining, we compared two foundation models: supervised TXRV and self-supervised REMEDIS with their general domain counterparts pretrained on ImageNet. Model performance was evaluated at both initialization and subsequent learning stages on two diagnostic tasks: psychiatric pneumonia and COVID-19. For initialization, we assessed their integration with three strategies: diversity, uncertainty, and hybrid sampling. For subsequent learning, we focused on uncertainty sampling powered by different pretrained models. We also conducted statistical tests to compare the foundation models with ImageNet counterparts, investigate the relationship between initialization and subsequent learning, examine the performance of one-shot initialization against the full AL process, and investigate the influence of class balance in initialization samples on initialization and subsequent learning.

Results

First, domain-specific foundation models failed to outperform ImageNet counterparts in six out of eight experiments on informative sample selection. Both domain-specific and general pretrained models were unable to generate representations that could substitute for the original images as model inputs in seven of the eight scenarios. However, pretrained model-based initialization surpassed random sampling, the default approach in cold-start AL. Second, initialization performance was positively correlated with subsequent learning performance, highlighting the importance of initialization strategies. Third, one-shot initialization performed comparably to the full AL process, demonstrating the potential of reducing experts' repeated waiting during AL iterations. Last, a U-shaped correlation was observed between the class balance of initialization samples and model performance, suggesting that the class balance is more strongly associated with performance at middle budget levels than at low or high budgets.

Conclusions

In this study, we highlighted the limitations of medical pretraining compared to general pretraining in the context of cold-start AL. We also identified promising outcomes related to cold-start AL, including initialization based on pretrained models, the positive influence of initialization on subsequent learning, the potential for one-shot initialization, and the influence of class balance on middle-budget AL. Researchers are encouraged to improve medical pretraining for versatile DL foundations and explore novel AL methods.

Abstract Image

通过监督或自我监督学习对胸片分类进行特定领域预训练的反思:冷启动主动学习中与 ImageNet 同类产品的比较研究
目标 深度学习(DL)已成为胸片分析的主流方法,但其性能严重依赖于大量的注释图像。为了降低成本,由初始化和后续学习组成的冷启动主动学习(AL)会选择一小部分信息数据点进行标注。最近,针对胸片的监督或自我监督学习在预训练模型方面取得了进展,显示出对各种下游任务的广泛适用性。然而,它们在冷启动 AL 中的潜力仍有待开发。 方法 为了验证特定领域预训练的效果,我们比较了两个基础模型:监督 TXRV 和自监督 REMEDIS,以及在 ImageNet 上进行预训练的一般领域对应模型。在初始化和后续学习阶段,我们对两个诊断任务的模型性能进行了评估:精神病性肺炎和 COVID-19。在初始化阶段,我们评估了它们与三种策略的整合情况:多样性、不确定性和混合采样。在后续学习方面,我们重点研究了由不同预训练模型驱动的不确定性采样。我们还进行了统计测试,将基础模型与 ImageNet 对应模型进行了比较,研究了初始化与后续学习之间的关系,检验了单次初始化与整个 AL 过程的性能,并研究了初始化样本中的类平衡对初始化和后续学习的影响。 结果 首先,在信息样本选择的八次实验中,有六次特定领域基础模型的表现都不如 ImageNet 对应模型。在八种情况中有七种情况下,特定领域模型和一般预训练模型都无法生成可以替代原始图像作为模型输入的表征。然而,基于预训练模型的初始化却超过了冷启动 AL 的默认方法--随机抽样。其次,初始化性能与后续学习性能呈正相关,这凸显了初始化策略的重要性。第三,单次初始化的表现与整个 AL 过程相当,这表明在 AL 迭代过程中减少专家重复等待的潜力。最后,我们观察到初始化样本的类平衡与模型性能之间存在 U 型相关关系,这表明类平衡与中等预算水平的性能关系比与低或高预算水平的性能关系更为密切。 结论 在本研究中,我们强调了在冷启动 AL 的背景下,医学预培训与一般预培训相比存在的局限性。我们还发现了与冷启动 AL 相关的有希望的结果,包括基于预训练模型的初始化、初始化对后续学习的积极影响、一次性初始化的潜力以及类平衡对中等预算 AL 的影响。我们鼓励研究人员改进通用 DL 基础的医学预训练,并探索新型 AL 方法。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
CiteScore
0.90
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信