猫:用于分割的互补CNN和变压器编码器

2022 IEEE 19th International Symposium on Biomedical Imaging (ISBI) Pub Date : 2022-03-28 DOI:10.1109/ISBI52829.2022.9761596

Hao Li, Dewei Hu, Han Liu, Jiacheng Wang, I. Oguz

{"title":"猫:用于分割的互补CNN和变压器编码器","authors":"Hao Li, Dewei Hu, Han Liu, Jiacheng Wang, I. Oguz","doi":"10.1109/ISBI52829.2022.9761596","DOIUrl":null,"url":null,"abstract":"Recently, deep learning methods have achieved state-of-the-art performance in many medical image segmentation tasks. Many of these are based on convolutional neural networks (CNNs). For such methods, the encoder is the key part for global and local information extraction from input images; the extracted features are then passed to the decoder for predicting the segmentations. In contrast, several recent works show a superior performance with the use of transformers, which can better model long-range spatial dependencies and capture low-level details. However, transformer as sole encoder underperforms for some tasks where it cannot efficiently replace the convolution based encoder. In this paper, we propose a model with double encoders for 3D biomedical image segmentation. Our model is a U-shaped CNN augmented with an independent transformer encoder. We fuse the information from the convolutional encoder and the transformer, and pass it to the decoder to obtain the results. We evaluate our methods on three public datasets from three different challenges: BTCV, MoDA and Decathlon. Compared to the state-of-theart models with and without transformers on each task, our proposed method obtains higher Dice scores across the board.","PeriodicalId":6827,"journal":{"name":"2022 IEEE 19th International Symposium on Biomedical Imaging (ISBI)","volume":"15 1","pages":"1-5"},"PeriodicalIF":0.0000,"publicationDate":"2022-03-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"13","resultStr":"{\"title\":\"Cats: Complementary CNN and Transformer Encoders for Segmentation\",\"authors\":\"Hao Li, Dewei Hu, Han Liu, Jiacheng Wang, I. Oguz\",\"doi\":\"10.1109/ISBI52829.2022.9761596\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Recently, deep learning methods have achieved state-of-the-art performance in many medical image segmentation tasks. Many of these are based on convolutional neural networks (CNNs). For such methods, the encoder is the key part for global and local information extraction from input images; the extracted features are then passed to the decoder for predicting the segmentations. In contrast, several recent works show a superior performance with the use of transformers, which can better model long-range spatial dependencies and capture low-level details. However, transformer as sole encoder underperforms for some tasks where it cannot efficiently replace the convolution based encoder. In this paper, we propose a model with double encoders for 3D biomedical image segmentation. Our model is a U-shaped CNN augmented with an independent transformer encoder. We fuse the information from the convolutional encoder and the transformer, and pass it to the decoder to obtain the results. We evaluate our methods on three public datasets from three different challenges: BTCV, MoDA and Decathlon. Compared to the state-of-theart models with and without transformers on each task, our proposed method obtains higher Dice scores across the board.\",\"PeriodicalId\":6827,\"journal\":{\"name\":\"2022 IEEE 19th International Symposium on Biomedical Imaging (ISBI)\",\"volume\":\"15 1\",\"pages\":\"1-5\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2022-03-28\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"13\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2022 IEEE 19th International Symposium on Biomedical Imaging (ISBI)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ISBI52829.2022.9761596\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 IEEE 19th International Symposium on Biomedical Imaging (ISBI)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ISBI52829.2022.9761596","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 13

摘要

最近，深度学习方法在许多医学图像分割任务中取得了最先进的性能。其中许多都是基于卷积神经网络(cnn)。在这些方法中，编码器是从输入图像中提取全局和局部信息的关键部分;然后将提取的特征传递给解码器以预测分割。相比之下，最近的几项研究表明，使用变压器可以更好地模拟远程空间依赖关系并捕获低级细节，从而取得了更好的性能。然而，变压器作为唯一的编码器在某些任务中表现不佳，它不能有效地取代基于卷积的编码器。本文提出了一种具有双编码器的生物医学三维图像分割模型。我们的模型是一个u形CNN，增强了一个独立的变压器编码器。我们将来自卷积编码器和变压器的信息融合，并将其传递给解码器以获得结果。我们在三个公共数据集上评估了我们的方法，这些数据集来自三个不同的挑战:BTCV、MoDA和Decathlon。与在每个任务上使用和不使用变压器的最先进模型相比，我们提出的方法获得了更高的Dice分数。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Cats: Complementary CNN and Transformer Encoders for Segmentation

Recently, deep learning methods have achieved state-of-the-art performance in many medical image segmentation tasks. Many of these are based on convolutional neural networks (CNNs). For such methods, the encoder is the key part for global and local information extraction from input images; the extracted features are then passed to the decoder for predicting the segmentations. In contrast, several recent works show a superior performance with the use of transformers, which can better model long-range spatial dependencies and capture low-level details. However, transformer as sole encoder underperforms for some tasks where it cannot efficiently replace the convolution based encoder. In this paper, we propose a model with double encoders for 3D biomedical image segmentation. Our model is a U-shaped CNN augmented with an independent transformer encoder. We fuse the information from the convolutional encoder and the transformer, and pass it to the decoder to obtain the results. We evaluate our methods on three public datasets from three different challenges: BTCV, MoDA and Decathlon. Compared to the state-of-theart models with and without transformers on each task, our proposed method obtains higher Dice scores across the board.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2022 IEEE 19th International Symposium on Biomedical Imaging (ISBI)

自引率

0.00%

发文量