Benchmarking and Boosting Transformers for Medical Image Classification.

DongAo Ma, Mohammad Reza Hosseinzadeh Taher, Jiaxuan Pang, Nahid Ui Islam, Fatemeh Haghighi, Michael B Gotway, Jianming Liang
{"title":"Benchmarking and Boosting Transformers for Medical Image Classification.","authors":"DongAo Ma, Mohammad Reza Hosseinzadeh Taher, Jiaxuan Pang, Nahid Ui Islam, Fatemeh Haghighi, Michael B Gotway, Jianming Liang","doi":"10.1007/978-3-031-16852-9_2","DOIUrl":null,"url":null,"abstract":"<p><p>Visual transformers have recently gained popularity in the computer vision community as they began to outrank convolutional neural networks (CNNs) in one representative visual benchmark after another. However, the competition between visual transformers and CNNs in medical imaging is rarely studied, leaving many important questions unanswered. As the first step, we benchmark how well existing transformer variants that use various (supervised and self-supervised) pre-training methods perform against CNNs on a variety of medical classification tasks. Furthermore, given the data-hungry nature of transformers and the annotation-deficiency challenge of medical imaging, we present a practical approach for bridging the domain gap between photographic and medical images by utilizing unlabeled large-scale in-domain data. Our extensive empirical evaluations reveal the following insights in medical imaging: (1) good initialization is more crucial for transformer-based models than for CNNs, (2) self-supervised learning based on masked image modeling captures more generalizable representations than supervised models, and (3) assembling a larger-scale domain-specific dataset can better bridge the domain gap between photographic and medical images via self-supervised continuous pre-training. We hope this benchmark study can direct future research on applying transformers to medical imaging analysis. All codes and pre-trained models are available on our GitHub page https://github.com/JLiangLab/BenchmarkTransformers.</p>","PeriodicalId":72837,"journal":{"name":"Domain adaptation and representation transfer : 4th MICCAI Workshop, DART 2022, held in conjunction with MICCAI 2022, Singapore, September 22, 2022, proceedings. Domain Adaptation and Representation Transfer (Workshop) (4th : 2022 : Sin...","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2022-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9646404/pdf/nihms-1846236.pdf","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Domain adaptation and representation transfer : 4th MICCAI Workshop, DART 2022, held in conjunction with MICCAI 2022, Singapore, September 22, 2022, proceedings. Domain Adaptation and Representation Transfer (Workshop) (4th : 2022 : Sin...","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1007/978-3-031-16852-9_2","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2022/9/15 0:00:00","PubModel":"Epub","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

Visual transformers have recently gained popularity in the computer vision community as they began to outrank convolutional neural networks (CNNs) in one representative visual benchmark after another. However, the competition between visual transformers and CNNs in medical imaging is rarely studied, leaving many important questions unanswered. As the first step, we benchmark how well existing transformer variants that use various (supervised and self-supervised) pre-training methods perform against CNNs on a variety of medical classification tasks. Furthermore, given the data-hungry nature of transformers and the annotation-deficiency challenge of medical imaging, we present a practical approach for bridging the domain gap between photographic and medical images by utilizing unlabeled large-scale in-domain data. Our extensive empirical evaluations reveal the following insights in medical imaging: (1) good initialization is more crucial for transformer-based models than for CNNs, (2) self-supervised learning based on masked image modeling captures more generalizable representations than supervised models, and (3) assembling a larger-scale domain-specific dataset can better bridge the domain gap between photographic and medical images via self-supervised continuous pre-training. We hope this benchmark study can direct future research on applying transformers to medical imaging analysis. All codes and pre-trained models are available on our GitHub page https://github.com/JLiangLab/BenchmarkTransformers.

用于医学图像分类的基准和提升变换器。
视觉变换器最近在计算机视觉领域大受欢迎,因为它们开始在一个又一个具有代表性的视觉基准测试中超越卷积神经网络(CNN)。然而,视觉变换器与卷积神经网络之间在医学成像领域的竞争却鲜有研究,导致许多重要问题悬而未决。作为第一步,我们对使用各种(监督和自我监督)预训练方法的现有变换器变体在各种医学分类任务中与 CNN 的表现进行了基准测试。此外,考虑到变换器的数据饥渴特性和医学影像的标注不足挑战,我们提出了一种实用的方法,利用未标注的大规模域内数据来弥合摄影和医学影像之间的领域差距。我们广泛的经验评估揭示了医学成像领域的以下启示:(1) 与 CNN 相比,良好的初始化对基于变换器的模型更为重要;(2) 与监督模型相比,基于遮蔽图像建模的自监督学习能捕捉到更多可泛化的表征;(3) 通过自监督持续预训练,组建更大规模的特定领域数据集能更好地弥合摄影图像与医学图像之间的领域差距。我们希望这项基准研究能指导未来将变换器应用于医学影像分析的研究。所有代码和预训练模型均可在我们的 GitHub 页面 https://github.com/JLiangLab/BenchmarkTransformers 上获取。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信