HMDA: A Hybrid Model With Multi-Scale Deformable Attention for Medical Image Segmentation

IF 6.7 2区医学 Q1 COMPUTER SCIENCE, INFORMATION SYSTEMS

IEEE Journal of Biomedical and Health Informatics Pub Date : 2024-10-07 DOI:10.1109/JBHI.2024.3469230

Mengmeng Wu;Tiantian Liu;Xin Dai;Chuyang Ye;Jinglong Wu;Shintaro Funahashi;Tianyi Yan

{"title":"HMDA: A Hybrid Model With Multi-Scale Deformable Attention for Medical Image Segmentation","authors":"Mengmeng Wu;Tiantian Liu;Xin Dai;Chuyang Ye;Jinglong Wu;Shintaro Funahashi;Tianyi Yan","doi":"10.1109/JBHI.2024.3469230","DOIUrl":null,"url":null,"abstract":"Transformers have been applied to medical image segmentation tasks owing to their excellent long-range modeling capability, compensating for the failure of Convolutional Neural Networks (CNNs) to extract global features. However, the standardized self-attention modules in Transformers, characterized by a uniform and inflexible pattern of attention distribution, frequently lead to unnecessary computational redundancy with high-dimensional data, consequently impeding the model's capacity for precise concentration on salient image regions. Additionally, achieving effective explicit interaction between the spatially detailed features captured by CNNs and the long-range contextual features provided by Transformers remains challenging. In this architecture, we propose a <underline>Hybrid Transformer and CNN architecture with <underline>Multi-scale <underline>Deformable <underline>Attention(HMDA), designed to address the aforementioned issues effectively. Specifically, we introduce a <underline>Multi-scale <underline>Spatially <underline>Adaptive <underline>Deformable <underline>Attention (MSADA) mechanism, which attends to a small set of key sampling points around a reference within the multi-scale features, to achieve better performance. In addition, we propose the Cross Attention Bridge (CAB) module, which integrates multi-scale transformer and local features through channel-wise cross attention enriching feature synthesis. HMDA is validated on multiple datasets, and the results demonstrate the effectiveness of our approach, which achieves competitive results compared to the previous methods.","PeriodicalId":13073,"journal":{"name":"IEEE Journal of Biomedical and Health Informatics","volume":"29 2","pages":"1243-1255"},"PeriodicalIF":6.7000,"publicationDate":"2024-10-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Journal of Biomedical and Health Informatics","FirstCategoryId":"5","ListUrlMain":"https://ieeexplore.ieee.org/document/10706868/","RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}

引用次数: 0

Abstract

Transformers have been applied to medical image segmentation tasks owing to their excellent long-range modeling capability, compensating for the failure of Convolutional Neural Networks (CNNs) to extract global features. However, the standardized self-attention modules in Transformers, characterized by a uniform and inflexible pattern of attention distribution, frequently lead to unnecessary computational redundancy with high-dimensional data, consequently impeding the model's capacity for precise concentration on salient image regions. Additionally, achieving effective explicit interaction between the spatially detailed features captured by CNNs and the long-range contextual features provided by Transformers remains challenging. In this architecture, we propose a Hybrid Transformer and CNN architecture with Multi-scale Deformable Attention(HMDA), designed to address the aforementioned issues effectively. Specifically, we introduce a Multi-scale Spatially Adaptive Deformable Attention (MSADA) mechanism, which attends to a small set of key sampling points around a reference within the multi-scale features, to achieve better performance. In addition, we propose the Cross Attention Bridge (CAB) module, which integrates multi-scale transformer and local features through channel-wise cross attention enriching feature synthesis. HMDA is validated on multiple datasets, and the results demonstrate the effectiveness of our approach, which achieves competitive results compared to the previous methods.

查看原文本刊更多论文

HMDA：用于医学图像分割的多尺度可变形注意力混合模型

变形器具有出色的长距离建模能力，弥补了卷积神经网络（CNN）无法提取全局特征的缺陷，因此已被应用于医学图像分割任务。然而，Transformers 中标准化的自我注意力模块具有注意力分布均匀且不灵活的特点，经常导致高维数据出现不必要的计算冗余，从而阻碍了模型精确集中于突出图像区域的能力。此外，在 CNN 捕捉到的空间细节特征与 Transformers 提供的远距离上下文特征之间实现有效的明确互动仍具有挑战性。在本架构中，我们提出了一种具有多尺度可变形注意力（HMDA）的混合变形器和 CNN 架构，旨在有效解决上述问题。具体来说，我们引入了多尺度空间自适应可变形关注（MSADA）机制，该机制关注多尺度特征中参考点周围的一小部分关键采样点，以实现更好的性能。此外，我们还提出了交叉注意桥（CAB）模块，它通过通道交叉注意丰富特征合成，整合了多尺度变换器和局部特征。我们在多个数据集上对 HMDA 进行了验证，结果表明我们的方法非常有效，与之前的方法相比取得了具有竞争力的结果。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

IEEE Journal of Biomedical and Health Informatics COMPUTER SCIENCE, INFORMATION SYSTEMS-COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS

CiteScore

13.60

自引率

6.50%

发文量

1151

期刊介绍： IEEE Journal of Biomedical and Health Informatics publishes original papers presenting recent advances where information and communication technologies intersect with health, healthcare, life sciences, and biomedicine. Topics include acquisition, transmission, storage, retrieval, management, and analysis of biomedical and health information. The journal covers applications of information technologies in healthcare, patient monitoring, preventive care, early disease diagnosis, therapy discovery, and personalized treatment protocols. It explores electronic medical and health records, clinical information systems, decision support systems, medical and biological imaging informatics, wearable systems, body area/sensor networks, and more. Integration-related topics like interoperability, evidence-based medicine, and secure patient data are also addressed.