Self-supervised pretraining in the wild imparts image acquisition robustness to medical image transformers: an application to lung cancer segmentation.

Proceedings of machine learning research Pub Date : 2024-07-01

Jue Jiang, Harini Veeraraghavan

{"title":"Self-supervised pretraining in the wild imparts image acquisition robustness to medical image transformers: an application to lung cancer segmentation.","authors":"Jue Jiang, Harini Veeraraghavan","doi":"","DOIUrl":null,"url":null,"abstract":"<p><p>Self-supervised learning (SSL) is an approach to pretrain models with unlabeled datasets and extract useful feature representations such that these models can be easily fine-tuned for various downstream tasks. Self-pretraining applies SSL on curated task-specific datasets without using task-specific labels. Increasing availability of public data repositories has now made it possible to utilize diverse and large, task unrelated datasets to pretrain models in the \"wild\" using SSL. However, the benefit of such wild-pretraining over self-pretraining has not been studied in the context of medical image analysis. Hence, we analyzed transformers (Swin and ViT) and a convolutional neural network created using wild- and self-pretraining trained to segment lung tumors from 3D-computed tomography (CT) scans in terms of: (a) accuracy, (b) fine-tuning epoch efficiency, and (c) robustness to image acquisition differences (contrast versus non-contrast, slice thickness, and image reconstruction kernels). We also studied feature reuse using centered kernel alignment (CKA) with the Swin networks. Our analysis with two independent testing (public N = 139; internal N = 196) datasets showed that wild-pretrained Swin models significantly outperformed self-pretrained Swin for the various imaging acquisitions. Fine-tuning epoch efficiency was higher for both wild-pretrained Swin and ViT models compared to their self-pretrained counterparts. Feature reuse close to the final encoder layers was lower than in the early layers for wild-pretrained models irrespective of the pretext tasks used in SSL. Models and code will be made available through GitHub upon manuscript acceptance.</p>","PeriodicalId":74504,"journal":{"name":"Proceedings of machine learning research","volume":"250 ","pages":"708-721"},"PeriodicalIF":0.0000,"publicationDate":"2024-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11741178/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of machine learning research","FirstCategoryId":"1085","ListUrlMain":"","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

Self-supervised learning (SSL) is an approach to pretrain models with unlabeled datasets and extract useful feature representations such that these models can be easily fine-tuned for various downstream tasks. Self-pretraining applies SSL on curated task-specific datasets without using task-specific labels. Increasing availability of public data repositories has now made it possible to utilize diverse and large, task unrelated datasets to pretrain models in the "wild" using SSL. However, the benefit of such wild-pretraining over self-pretraining has not been studied in the context of medical image analysis. Hence, we analyzed transformers (Swin and ViT) and a convolutional neural network created using wild- and self-pretraining trained to segment lung tumors from 3D-computed tomography (CT) scans in terms of: (a) accuracy, (b) fine-tuning epoch efficiency, and (c) robustness to image acquisition differences (contrast versus non-contrast, slice thickness, and image reconstruction kernels). We also studied feature reuse using centered kernel alignment (CKA) with the Swin networks. Our analysis with two independent testing (public N = 139; internal N = 196) datasets showed that wild-pretrained Swin models significantly outperformed self-pretrained Swin for the various imaging acquisitions. Fine-tuning epoch efficiency was higher for both wild-pretrained Swin and ViT models compared to their self-pretrained counterparts. Feature reuse close to the final encoder layers was lower than in the early layers for wild-pretrained models irrespective of the pretext tasks used in SSL. Models and code will be made available through GitHub upon manuscript acceptance.

Abstract Image

本刊更多论文

野外自监督预训练赋予医学图像转换器图像获取鲁棒性：肺癌分割的应用。

自监督学习（Self-supervised learning， SSL）是一种使用未标记数据集预训练模型并提取有用特征表示的方法，这样这些模型就可以很容易地针对各种下游任务进行微调。自我预训练在特定任务的数据集上应用SSL，而不使用特定任务的标签。公共数据存储库的可用性越来越高，现在可以利用各种各样的、大型的、任务无关的数据集来使用SSL在“野外”预训练模型。然而，在医学图像分析的背景下，这种野生预训练相对于自我预训练的好处尚未得到研究。因此，我们分析了变压器（Swin和ViT）和卷积神经网络，该网络使用野生和自我预训练创建，用于从3d计算机断层扫描（CT）扫描中分割肺肿瘤，并从以下方面进行了分析：(a)准确性，(b)微调epoch效率，以及(c)对图像采集差异（对比度与非对比度，切片厚度和图像重建核）的鲁棒性。我们还研究了在Swin网络中使用中心核对齐（CKA）的特征重用。我们的分析采用两个独立检验(公共N = 139；内部N = 196)数据集表明，野生预训练的Swin模型在各种成像获取方面明显优于自预训练的Swin。与自我预训练的模型相比，野生预训练的Swin和ViT模型的微调历元效率更高。无论SSL中使用的借口任务如何，接近最终编码器层的特征重用低于原始预训练模型的早期层。稿件接受后，模型和代码将通过GitHub提供。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Proceedings of machine learning research

自引率

0.00%

发文量