MP-FocalUNet: Multiscale parallel focal self-attention U-Net for medical image segmentation

IF 4.9 2区医学 Q1 COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS

Computer methods and programs in biomedicine Pub Date : 2024-12-09 DOI:10.1016/j.cmpb.2024.108562

Chuan Wang , Mingfeng Jiang , Yang Li , Bo Wei , Yongming Li , Pin Wang , Guang Yang

{"title":"MP-FocalUNet: Multiscale parallel focal self-attention U-Net for medical image segmentation","authors":"Chuan Wang , Mingfeng Jiang , Yang Li , Bo Wei , Yongming Li , Pin Wang , Guang Yang","doi":"10.1016/j.cmpb.2024.108562","DOIUrl":null,"url":null,"abstract":"<div><h3>Background and Objective</h3><div>Medical image segmentation has been significantly improved in recent years with the progress of Convolutional Neural Networks (CNNs). Due to the inherent limitations of convolutional operations, CNNs perform poorly in learning the correlation information between global and long-range features. To solve this problem, some existing solutions rely on building deep encoders and down-sampling operations, but such methods are prone to produce redundant network structures and lose local details. Therefore, medical image segmentation tasks require better solutions to improve the modeling of the global context, while maintaining a strong grasp of the low-level details.</div></div><div><h3>Methods</h3><div>We propose a novel multiscale parallel branch architecture (MP-FocalUNet). On the encoder side of MP-FocalUNet, dual-scale sub-networks are used to extract information of different scales. A cross-scale “Feature Fusion” (FF) module was proposed to explore the potential of dual branch networks and fully utilize feature representations at different scales. On the decoder side, combined with the traditional CNN in parallel, focal self-attention is used for long-distance modeling, which can effectively capture the global dependencies and underlying spatial details in a shallower way.</div></div><div><h3>Results</h3><div>Our proposed method is evaluated on both abdominal organ segmentation datasets and automatic cardiac diagnosis challenge datasets. Our method consistently outperforms several state-of-the-art segmentation methods with an average Dice score of 82.45 % (2.68 % higher than HC-Net) and 91.44 % (0.35 % higher than HC-Net) on the abdominal organ datasets and the automatic cardiac diagnosis challenge datasets, respectively.</div></div><div><h3>Conclusions</h3><div>Our MP-FocalUNet is a novel encoder-decoder based multiscale parallel branch Transformer network, which solves the problem of insufficient long-distance modeling in CNNs and fuses image information at different scales. Extensive experiments on abdominal and cardiac medical image segmentation tasks show that our MP-FocalUNet outperforms other state-of-the-art methods. In the future, our work will focus on designing more lightweight Transformer-based models and better learning pixel-level intrinsic structural features generated by patch division in visual Transformers.</div></div>","PeriodicalId":10624,"journal":{"name":"Computer methods and programs in biomedicine","volume":"260 ","pages":"Article 108562"},"PeriodicalIF":4.9000,"publicationDate":"2024-12-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Computer methods and programs in biomedicine","FirstCategoryId":"5","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0169260724005558","RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS","Score":null,"Total":0}

引用次数: 0

Abstract

Background and Objective

Medical image segmentation has been significantly improved in recent years with the progress of Convolutional Neural Networks (CNNs). Due to the inherent limitations of convolutional operations, CNNs perform poorly in learning the correlation information between global and long-range features. To solve this problem, some existing solutions rely on building deep encoders and down-sampling operations, but such methods are prone to produce redundant network structures and lose local details. Therefore, medical image segmentation tasks require better solutions to improve the modeling of the global context, while maintaining a strong grasp of the low-level details.

Methods

We propose a novel multiscale parallel branch architecture (MP-FocalUNet). On the encoder side of MP-FocalUNet, dual-scale sub-networks are used to extract information of different scales. A cross-scale “Feature Fusion” (FF) module was proposed to explore the potential of dual branch networks and fully utilize feature representations at different scales. On the decoder side, combined with the traditional CNN in parallel, focal self-attention is used for long-distance modeling, which can effectively capture the global dependencies and underlying spatial details in a shallower way.

Results

Our proposed method is evaluated on both abdominal organ segmentation datasets and automatic cardiac diagnosis challenge datasets. Our method consistently outperforms several state-of-the-art segmentation methods with an average Dice score of 82.45 % (2.68 % higher than HC-Net) and 91.44 % (0.35 % higher than HC-Net) on the abdominal organ datasets and the automatic cardiac diagnosis challenge datasets, respectively.

Conclusions

Our MP-FocalUNet is a novel encoder-decoder based multiscale parallel branch Transformer network, which solves the problem of insufficient long-distance modeling in CNNs and fuses image information at different scales. Extensive experiments on abdominal and cardiac medical image segmentation tasks show that our MP-FocalUNet outperforms other state-of-the-art methods. In the future, our work will focus on designing more lightweight Transformer-based models and better learning pixel-level intrinsic structural features generated by patch division in visual Transformers.

查看原文本刊更多论文

MP-FocalUNet：用于医学图像分割的多尺度并行焦点自注意 U-Net

背景和目的：近年来，随着卷积神经网络（CNN）的发展，医学影像分割技术得到了显著提高。由于卷积操作的固有局限性，CNN 在学习全局特征和长程特征之间的相关信息方面表现不佳。为解决这一问题，现有的一些解决方案依赖于构建深度编码器和下采样操作，但这些方法容易产生冗余网络结构，丢失局部细节。因此，医学图像分割任务需要更好的解决方案来改进全局建模，同时保持对低层次细节的有力把握：我们提出了一种新型多尺度并行分支架构（MP-FocalUNet）。在 MP-FocalUNet 的编码器一侧，双尺度子网络用于提取不同尺度的信息。跨尺度 "特征融合"（Feature Fusion，FF）模块被提出来探索双分支网络的潜力，并充分利用不同尺度的特征表征。在解码器方面，结合并行的传统 CNN，利用焦点自注意力进行远距离建模，可以有效地捕捉全局依赖性和底层空间细节：我们提出的方法在腹部器官分割数据集和自动心脏诊断挑战数据集上进行了评估。在腹部器官数据集和自动心脏诊断挑战数据集上，我们的方法始终优于几种最先进的分割方法，平均 Dice 分数分别为 82.45 %（比 HC-Net 高 2.68 %）和 91.44 %（比 HC-Net 高 0.35 %）：我们的MP-FocalUNet是一种基于多尺度并行分支变换器网络的新型编码器-解码器，它解决了CNN远距离建模不足的问题，融合了不同尺度的图像信息。在腹部和心脏医学图像分割任务中进行的大量实验表明，我们的 MP-FocalUNet 优于其他最先进的方法。未来，我们的工作重点将是设计更轻量级的基于变换器的模型，以及更好地学习视觉变换器中由斑块分割产生的像素级内在结构特征。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Computer methods and programs in biomedicine 工程技术-工程：生物医学

CiteScore

12.30

自引率

6.60%

发文量

601

审稿时长

135 days

期刊介绍： To encourage the development of formal computing methods, and their application in biomedical research and medical practice, by illustration of fundamental principles in biomedical informatics research; to stimulate basic research into application software design; to report the state of research of biomedical information processing projects; to report new computer methodologies applied in biomedical areas; the eventual distribution of demonstrable software to avoid duplication of effort; to provide a forum for discussion and improvement of existing software; to optimize contact between national organizations and regional user groups by promoting an international exchange of information on formal methods, standards and software in biomedicine. Computer Methods and Programs in Biomedicine covers computing methodology and software systems derived from computing science for implementation in all aspects of biomedical research and medical practice. It is designed to serve: biochemists; biologists; geneticists; immunologists; neuroscientists; pharmacologists; toxicologists; clinicians; epidemiologists; psychiatrists; psychologists; cardiologists; chemists; (radio)physicists; computer scientists; programmers and systems analysts; biomedical, clinical, electrical and other engineers; teachers of medical informatics and users of educational software.