Uni4Eye++: A General Masked Image Modeling Multi-modal Pre-training Framework for Ophthalmic Image Classification and Segmentation.

Zhiyuan Cai, Li Lin, Huaqing He, Pujin Cheng, Xiaoying Tang
{"title":"Uni4Eye++: A General Masked Image Modeling Multi-modal Pre-training Framework for Ophthalmic Image Classification and Segmentation.","authors":"Zhiyuan Cai, Li Lin, Huaqing He, Pujin Cheng, Xiaoying Tang","doi":"10.1109/TMI.2024.3422102","DOIUrl":null,"url":null,"abstract":"<p><p>A large-scale labeled dataset is a key factor for the success of supervised deep learning in most ophthalmic image analysis scenarios. However, limited annotated data is very common in ophthalmic image analysis, since manual annotation is time-consuming and labor-intensive. Self-supervised learning (SSL) methods bring huge opportunities for better utilizing unlabeled data, as they do not require massive annotations. To utilize as many unlabeled ophthalmic images as possible, it is necessary to break the dimension barrier, simultaneously making use of both 2D and 3D images as well as alleviating the issue of catastrophic forgetting. In this paper, we propose a universal self-supervised Transformer framework named Uni4Eye++ to discover the intrinsic image characteristic and capture domain-specific feature embedding in ophthalmic images. Uni4Eye++ can serve as a global feature extractor, which builds its basis on a Masked Image Modeling task with a Vision Transformer architecture. On the basis of our previous work Uni4Eye, we further employ an image entropy guided masking strategy to reconstruct more-informative patches and a dynamic head generator module to alleviate modality confusion. We evaluate the performance of our pre-trained Uni4Eye++ encoder by fine-tuning it on multiple downstream ophthalmic image classification and segmentation tasks. The superiority of Uni4Eye++ is successfully established through comparisons to other state-of-the-art SSL pre-training methods. Our code is available at Github<sup>1</sup>.</p>","PeriodicalId":94033,"journal":{"name":"IEEE transactions on medical imaging","volume":"PP ","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-07-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE transactions on medical imaging","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/TMI.2024.3422102","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

A large-scale labeled dataset is a key factor for the success of supervised deep learning in most ophthalmic image analysis scenarios. However, limited annotated data is very common in ophthalmic image analysis, since manual annotation is time-consuming and labor-intensive. Self-supervised learning (SSL) methods bring huge opportunities for better utilizing unlabeled data, as they do not require massive annotations. To utilize as many unlabeled ophthalmic images as possible, it is necessary to break the dimension barrier, simultaneously making use of both 2D and 3D images as well as alleviating the issue of catastrophic forgetting. In this paper, we propose a universal self-supervised Transformer framework named Uni4Eye++ to discover the intrinsic image characteristic and capture domain-specific feature embedding in ophthalmic images. Uni4Eye++ can serve as a global feature extractor, which builds its basis on a Masked Image Modeling task with a Vision Transformer architecture. On the basis of our previous work Uni4Eye, we further employ an image entropy guided masking strategy to reconstruct more-informative patches and a dynamic head generator module to alleviate modality confusion. We evaluate the performance of our pre-trained Uni4Eye++ encoder by fine-tuning it on multiple downstream ophthalmic image classification and segmentation tasks. The superiority of Uni4Eye++ is successfully established through comparisons to other state-of-the-art SSL pre-training methods. Our code is available at Github1.

Uni4Eye++:用于眼科图像分类和分割的通用屏蔽图像建模多模态预训练框架
在大多数眼科图像分析场景中,大规模标注数据集是有监督深度学习取得成功的关键因素。然而,在眼科图像分析中,标注数据有限的情况非常普遍,因为人工标注既耗时又耗力。自监督学习(SSL)方法不需要大量注释,因此为更好地利用未标注数据带来了巨大的机遇。要利用尽可能多的未标记眼科图像,就必须打破维度障碍,同时利用二维和三维图像,并缓解灾难性遗忘问题。在本文中,我们提出了一个名为 Uni4Eye++ 的通用自监督变换器框架,用于发现眼科图像的内在特征并捕捉特定领域的特征嵌入。Uni4Eye++ 可作为全局特征提取器,其基础是具有视觉变换器架构的遮罩图像建模任务。在之前的 Uni4Eye 工作基础上,我们进一步采用了图像熵引导的遮罩策略来重建信息量更大的补丁,并使用动态头部生成器模块来缓解模态混淆。我们通过在多个下游眼科图像分类和分割任务中对预先训练好的 Uni4Eye++ 编码器进行微调来评估其性能。通过与其他最先进的 SSL 预训练方法进行比较,我们成功地确定了 Uni4Eye++ 的优越性。我们的代码可在 Github 上获取1。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信