Rethinking Domain-Specific Pretraining by Supervised or Self-Supervised Learning for Chest Radiograph Classification: A Comparative Study Against ImageNet Counterparts in Cold-Start Active Learning

Health Care Science Pub Date : 2025-04-06 DOI:10.1002/hcs2.70009

Han Yuan, Mingcheng Zhu, Rui Yang, Han Liu, Irene Li, Chuan Hong

{"title":"Rethinking Domain-Specific Pretraining by Supervised or Self-Supervised Learning for Chest Radiograph Classification: A Comparative Study Against ImageNet Counterparts in Cold-Start Active Learning","authors":"Han Yuan, Mingcheng Zhu, Rui Yang, Han Liu, Irene Li, Chuan Hong","doi":"10.1002/hcs2.70009","DOIUrl":null,"url":null,"abstract":"<div>\n \n \n <section>\n \n <h3> Objective</h3>\n \n <p>Deep learning (DL) has become the prevailing method in chest radiograph analysis, yet its performance heavily depends on large quantities of annotated images. To mitigate the cost, cold-start active learning (AL), comprising an initialization followed by subsequent learning, selects a small subset of informative data points for labeling. Recent advancements in pretrained models by supervised or self-supervised learning tailored to chest radiograph have shown broad applicability to diverse downstream tasks. However, their potential in cold-start AL remains unexplored.</p>\n </section>\n \n <section>\n \n <h3> Methods</h3>\n \n <p>To validate the efficacy of domain-specific pretraining, we compared two foundation models: supervised TXRV and self-supervised REMEDIS with their general domain counterparts pretrained on ImageNet. Model performance was evaluated at both initialization and subsequent learning stages on two diagnostic tasks: psychiatric pneumonia and COVID-19. For initialization, we assessed their integration with three strategies: diversity, uncertainty, and hybrid sampling. For subsequent learning, we focused on uncertainty sampling powered by different pretrained models. We also conducted statistical tests to compare the foundation models with ImageNet counterparts, investigate the relationship between initialization and subsequent learning, examine the performance of one-shot initialization against the full AL process, and investigate the influence of class balance in initialization samples on initialization and subsequent learning.</p>\n </section>\n \n <section>\n \n <h3> Results</h3>\n \n <p>First, domain-specific foundation models failed to outperform ImageNet counterparts in six out of eight experiments on informative sample selection. Both domain-specific and general pretrained models were unable to generate representations that could substitute for the original images as model inputs in seven of the eight scenarios. However, pretrained model-based initialization surpassed random sampling, the default approach in cold-start AL. Second, initialization performance was positively correlated with subsequent learning performance, highlighting the importance of initialization strategies. Third, one-shot initialization performed comparably to the full AL process, demonstrating the potential of reducing experts' repeated waiting during AL iterations. Last, a U-shaped correlation was observed between the class balance of initialization samples and model performance, suggesting that the class balance is more strongly associated with performance at middle budget levels than at low or high budgets.</p>\n </section>\n \n <section>\n \n <h3> Conclusions</h3>\n \n <p>In this study, we highlighted the limitations of medical pretraining compared to general pretraining in the context of cold-start AL. We also identified promising outcomes related to cold-start AL, including initialization based on pretrained models, the positive influence of initialization on subsequent learning, the potential for one-shot initialization, and the influence of class balance on middle-budget AL. Researchers are encouraged to improve medical pretraining for versatile DL foundations and explore novel AL methods.</p>\n </section>\n </div>","PeriodicalId":100601,"journal":{"name":"Health Care Science","volume":"4 2","pages":"110-143"},"PeriodicalIF":0.0000,"publicationDate":"2025-04-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1002/hcs2.70009","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Health Care Science","FirstCategoryId":"1085","ListUrlMain":"https://onlinelibrary.wiley.com/doi/10.1002/hcs2.70009","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

Objective

Deep learning (DL) has become the prevailing method in chest radiograph analysis, yet its performance heavily depends on large quantities of annotated images. To mitigate the cost, cold-start active learning (AL), comprising an initialization followed by subsequent learning, selects a small subset of informative data points for labeling. Recent advancements in pretrained models by supervised or self-supervised learning tailored to chest radiograph have shown broad applicability to diverse downstream tasks. However, their potential in cold-start AL remains unexplored.

Methods

To validate the efficacy of domain-specific pretraining, we compared two foundation models: supervised TXRV and self-supervised REMEDIS with their general domain counterparts pretrained on ImageNet. Model performance was evaluated at both initialization and subsequent learning stages on two diagnostic tasks: psychiatric pneumonia and COVID-19. For initialization, we assessed their integration with three strategies: diversity, uncertainty, and hybrid sampling. For subsequent learning, we focused on uncertainty sampling powered by different pretrained models. We also conducted statistical tests to compare the foundation models with ImageNet counterparts, investigate the relationship between initialization and subsequent learning, examine the performance of one-shot initialization against the full AL process, and investigate the influence of class balance in initialization samples on initialization and subsequent learning.

Results

First, domain-specific foundation models failed to outperform ImageNet counterparts in six out of eight experiments on informative sample selection. Both domain-specific and general pretrained models were unable to generate representations that could substitute for the original images as model inputs in seven of the eight scenarios. However, pretrained model-based initialization surpassed random sampling, the default approach in cold-start AL. Second, initialization performance was positively correlated with subsequent learning performance, highlighting the importance of initialization strategies. Third, one-shot initialization performed comparably to the full AL process, demonstrating the potential of reducing experts' repeated waiting during AL iterations. Last, a U-shaped correlation was observed between the class balance of initialization samples and model performance, suggesting that the class balance is more strongly associated with performance at middle budget levels than at low or high budgets.

Conclusions

In this study, we highlighted the limitations of medical pretraining compared to general pretraining in the context of cold-start AL. We also identified promising outcomes related to cold-start AL, including initialization based on pretrained models, the positive influence of initialization on subsequent learning, the potential for one-shot initialization, and the influence of class balance on middle-budget AL. Researchers are encouraged to improve medical pretraining for versatile DL foundations and explore novel AL methods.

Abstract Image

查看原文本刊更多论文

通过监督或自我监督学习对胸片分类进行特定领域预训练的反思：冷启动主动学习中与 ImageNet 同类产品的比较研究

目标深度学习（DL）已成为胸片分析的主流方法，但其性能严重依赖于大量的注释图像。为了降低成本，由初始化和后续学习组成的冷启动主动学习（AL）会选择一小部分信息数据点进行标注。最近，针对胸片的监督或自我监督学习在预训练模型方面取得了进展，显示出对各种下游任务的广泛适用性。然而，它们在冷启动 AL 中的潜力仍有待开发。方法为了验证特定领域预训练的效果，我们比较了两个基础模型：监督 TXRV 和自监督 REMEDIS，以及在 ImageNet 上进行预训练的一般领域对应模型。在初始化和后续学习阶段，我们对两个诊断任务的模型性能进行了评估：精神病性肺炎和 COVID-19。在初始化阶段，我们评估了它们与三种策略的整合情况：多样性、不确定性和混合采样。在后续学习方面，我们重点研究了由不同预训练模型驱动的不确定性采样。我们还进行了统计测试，将基础模型与 ImageNet 对应模型进行了比较，研究了初始化与后续学习之间的关系，检验了单次初始化与整个 AL 过程的性能，并研究了初始化样本中的类平衡对初始化和后续学习的影响。结果首先，在信息样本选择的八次实验中，有六次特定领域基础模型的表现都不如 ImageNet 对应模型。在八种情况中有七种情况下，特定领域模型和一般预训练模型都无法生成可以替代原始图像作为模型输入的表征。然而，基于预训练模型的初始化却超过了冷启动 AL 的默认方法--随机抽样。其次，初始化性能与后续学习性能呈正相关，这凸显了初始化策略的重要性。第三，单次初始化的表现与整个 AL 过程相当，这表明在 AL 迭代过程中减少专家重复等待的潜力。最后，我们观察到初始化样本的类平衡与模型性能之间存在 U 型相关关系，这表明类平衡与中等预算水平的性能关系比与低或高预算水平的性能关系更为密切。结论在本研究中，我们强调了在冷启动 AL 的背景下，医学预培训与一般预培训相比存在的局限性。我们还发现了与冷启动 AL 相关的有希望的结果，包括基于预训练模型的初始化、初始化对后续学习的积极影响、一次性初始化的潜力以及类平衡对中等预算 AL 的影响。我们鼓励研究人员改进通用 DL 基础的医学预训练，并探索新型 AL 方法。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Health Care Science

CiteScore

0.90

自引率

0.00%

发文量