Bowen Zhou;Xingbo Dong;Xiaowei Zhao;Chenglong Li;Zhe Jin;Huabin Wang
{"title":"FCT-Net: Efficient Bridge Fusion Incorporating CNN-Transformer Network for Medical Image Segmentation","authors":"Bowen Zhou;Xingbo Dong;Xiaowei Zhao;Chenglong Li;Zhe Jin;Huabin Wang","doi":"10.1109/TRPMS.2024.3516014","DOIUrl":null,"url":null,"abstract":"The hybrid architecture of convolutional neural networks (CNNs) and Transformers has gained popularity in medical image segmentation. However, in this hybrid architecture, the semantic gaps between multiscale CNN and Transformer branch features hinder the segmentation performance. To address this issue, we propose a new pipeline for medical image segmentation, named FCT-Net. First, we employ large kernel convolutions in CNN branch and the shift-window self-attention mechanism in Transformer branch to construct a parallel three-branch architecture for efficiently capturing global and local information. Second, to better fuse CNN and Transformer features, we introduce the bridge fusion module (BFM), to effectively extract local features and global representations at different semantic scales by integrating semantic information from different branches, thereby reducing the semantic gap between features. Finally, to capture multiscale information during the encoding process, we design the multiscale feature compilation module (MFCM) to adaptively fuse features from different stages of the encoder. Additionally, we introduce residual attention (RA) to enhance the features obtained after encoding, further boosting the network’s representational capacity. FCT-Net is evaluated on four different medical image segmentation benchmarks, achieving Dice scores of 83.56%, 90.87%, 92.21%, and 91.92% on COVID-19 Lung dataset, ISIC-2018, SegPC-2021, and ACDC, respectively, outperforming other state-of-the-art methods. Source code will be available at <uri>https://github.com/ZBW830/FCTNet</uri>.","PeriodicalId":46807,"journal":{"name":"IEEE Transactions on Radiation and Plasma Medical Sciences","volume":"9 6","pages":"762-775"},"PeriodicalIF":3.5000,"publicationDate":"2024-12-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10804195","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Radiation and Plasma Medical Sciences","FirstCategoryId":"1085","ListUrlMain":"https://ieeexplore.ieee.org/document/10804195/","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"RADIOLOGY, NUCLEAR MEDICINE & MEDICAL IMAGING","Score":null,"Total":0}
引用次数: 0
Abstract
The hybrid architecture of convolutional neural networks (CNNs) and Transformers has gained popularity in medical image segmentation. However, in this hybrid architecture, the semantic gaps between multiscale CNN and Transformer branch features hinder the segmentation performance. To address this issue, we propose a new pipeline for medical image segmentation, named FCT-Net. First, we employ large kernel convolutions in CNN branch and the shift-window self-attention mechanism in Transformer branch to construct a parallel three-branch architecture for efficiently capturing global and local information. Second, to better fuse CNN and Transformer features, we introduce the bridge fusion module (BFM), to effectively extract local features and global representations at different semantic scales by integrating semantic information from different branches, thereby reducing the semantic gap between features. Finally, to capture multiscale information during the encoding process, we design the multiscale feature compilation module (MFCM) to adaptively fuse features from different stages of the encoder. Additionally, we introduce residual attention (RA) to enhance the features obtained after encoding, further boosting the network’s representational capacity. FCT-Net is evaluated on four different medical image segmentation benchmarks, achieving Dice scores of 83.56%, 90.87%, 92.21%, and 91.92% on COVID-19 Lung dataset, ISIC-2018, SegPC-2021, and ACDC, respectively, outperforming other state-of-the-art methods. Source code will be available at https://github.com/ZBW830/FCTNet.