HCT-Unet: multi-target medical image segmentation via a hybrid CNN-transformer Unet incorporating multi-axis gated multi-layer perceptron

Yazhuo Fan, Jianhua Song, Lei Yuan, Yunlin Jia
{"title":"HCT-Unet: multi-target medical image segmentation via a hybrid CNN-transformer Unet incorporating multi-axis gated multi-layer perceptron","authors":"Yazhuo Fan, Jianhua Song, Lei Yuan, Yunlin Jia","doi":"10.1007/s00371-024-03612-y","DOIUrl":null,"url":null,"abstract":"<p>In recent years, for the purpose of integrating the individual strengths of convolutional neural networks (CNN) and Transformer, a network structure has been built to integrate the two methods in medical image segmentation. But most of the methods only integrate CNN and Transformer at a single level and cannot extract low-level detail features and high-level abstract information simultaneously. Meanwhile, this structure lacks flexibility, unable to dynamically adjust the contributions of different feature maps. To address these limitations, we introduce HCT-Unet, a hybrid CNN-Transformer model specifically designed for multi-organ medical images segmentation. HCT-Unet introduces a tunable hybrid paradigm that differs significantly from conventional hybrid architectures. It deploys powerful CNN to capture short-range information and Transformer to extract long-range information at each stage. Furthermore, we have designed a multi-functional multi-scale fusion bridge, which progressively integrates information from different scales and dynamically modifies attention weights for both local and global features. With the benefits of these two innovative designs, HCT-Unet demonstrates robust discriminative dependency and representation capabilities in multi-target medical image tasks. Experimental results reveal the remarkable performance of our approach in medical image segmentation tasks. Specifically, in multi-organ segmentation tasks, HCT-Unet achieved a Dice similarity coefficient (DSC) of 82.23%. Furthermore, in cardiac segmentation tasks, it reached a DSC of 91%, significantly outperforming previous state-of-the-art networks. The code has been released on Zenodo: https://zenodo.org/doi/10.5281/zenodo.11070837.</p>","PeriodicalId":501186,"journal":{"name":"The Visual Computer","volume":"23 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-09-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"The Visual Computer","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1007/s00371-024-03612-y","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

In recent years, for the purpose of integrating the individual strengths of convolutional neural networks (CNN) and Transformer, a network structure has been built to integrate the two methods in medical image segmentation. But most of the methods only integrate CNN and Transformer at a single level and cannot extract low-level detail features and high-level abstract information simultaneously. Meanwhile, this structure lacks flexibility, unable to dynamically adjust the contributions of different feature maps. To address these limitations, we introduce HCT-Unet, a hybrid CNN-Transformer model specifically designed for multi-organ medical images segmentation. HCT-Unet introduces a tunable hybrid paradigm that differs significantly from conventional hybrid architectures. It deploys powerful CNN to capture short-range information and Transformer to extract long-range information at each stage. Furthermore, we have designed a multi-functional multi-scale fusion bridge, which progressively integrates information from different scales and dynamically modifies attention weights for both local and global features. With the benefits of these two innovative designs, HCT-Unet demonstrates robust discriminative dependency and representation capabilities in multi-target medical image tasks. Experimental results reveal the remarkable performance of our approach in medical image segmentation tasks. Specifically, in multi-organ segmentation tasks, HCT-Unet achieved a Dice similarity coefficient (DSC) of 82.23%. Furthermore, in cardiac segmentation tasks, it reached a DSC of 91%, significantly outperforming previous state-of-the-art networks. The code has been released on Zenodo: https://zenodo.org/doi/10.5281/zenodo.11070837.

Abstract Image

HCT-Unet:通过包含多轴门控多层感知器的混合 CNN 变换器 Unet 进行多目标医学图像分割
近年来,为了整合卷积神经网络(CNN)和变换器(Transformer)的各自优势,人们建立了一种网络结构,将这两种方法整合到医学图像分割中。但大多数方法只是在单一层面上整合了 CNN 和 Transformer,无法同时提取低层次细节特征和高层次抽象信息。同时,这种结构缺乏灵活性,无法动态调整不同特征图的贡献。为了解决这些局限性,我们引入了 HCT-Unet,这是一种专为多器官医学图像分割设计的混合 CNN-Transformer 模型。HCT-Unet 引入了一种可调整的混合模式,与传统的混合架构有很大不同。它部署了功能强大的 CNN 来捕捉短程信息,并在每个阶段部署 Transformer 来提取远程信息。此外,我们还设计了一个多功能多尺度融合桥,它能逐步整合来自不同尺度的信息,并动态修改局部和全局特征的注意力权重。借助这两项创新设计的优势,HCT-Unet 在多目标医学图像任务中表现出了强大的判别依赖性和表示能力。实验结果表明,我们的方法在医学图像分割任务中表现出色。具体来说,在多器官分割任务中,HCT-Unet 的戴斯相似系数(DSC)达到了 82.23%。此外,在心脏分割任务中,HCT-Unet 的 DSC 高达 91%,明显优于之前最先进的网络。代码已在 Zenodo 上发布:https://zenodo.org/doi/10.5281/zenodo.11070837。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信