Junding Sun , Yabei Li , Xiaosheng Wu , Chaosheng Tang , Shuihua Wang , Yudong Zhang
{"title":"HAD-Net: An attention U-based network with hyper-scale shifted aggregating and max-diagonal sampling for medical image segmentation","authors":"Junding Sun , Yabei Li , Xiaosheng Wu , Chaosheng Tang , Shuihua Wang , Yudong Zhang","doi":"10.1016/j.cviu.2024.104151","DOIUrl":null,"url":null,"abstract":"<div><h3>Objectives:</h3><p>Accurate extraction of regions of interest (ROI) with variable shapes and scales is one of the primary challenges in medical image segmentation. Current U-based networks mostly aggregate multi-stage encoding outputs as an improved multi-scale skip connection. Although this design has been proven to provide scale diversity and contextual integrity, there remain several intuitive limits: <strong>(i)</strong> the encoding outputs are resampled to the same size simply, which destruct the fine-grained information. The advantages of utilization of multiple scales are insufficient. <strong>(ii)</strong> Certain redundant information proportional to the feature dimension size is introduced and causes multi-stage interference. And <strong>(iii)</strong> the precision of information delivery relies on the up-sampling and down-sampling layers, but guidance on maintaining consistency in feature locations and trends between them is lacking.</p></div><div><h3>Methods:</h3><p>To improve these situations, this paper proposed a U-based CNN network named HAD-Net, by assembling a new hyper-scale shifted aggregating module (HSAM) paradigm and progressive reusing attention (PRA) for skip connections, as well as employing a novel pair of dual-branch parameter-free sampling layers, i.e. max-diagonal pooling (MDP) and max-diagonal un-pooling (MDUP). That is, the aggregating scheme additionally combines five subregions with certain offsets in the shallower stage. Since the lower scale-down ratios of subregions enrich scales and fine-grain context. Then, the attention scheme contains a partial-to-global channel attention (PGCA) and a multi-scale reusing spatial attention (MRSA), it builds reusing connections internally and adjusts the focus on more useful dimensions. Finally, MDP and MDUP are explored in pairs to improve texture delivery and feature consistency, enhancing information retention and avoiding positional confusion.</p></div><div><h3>Results:</h3><p>Compared to state-of-the-art networks, HAD-Net has achieved comparable and even better performances with Dice of 90.13%, 81.51%, and 75.43% for each class on BraTS20, 89.59% Dice and 98.56% AUC on Kvasir-SEG, as well as 82.17% Dice and 98.05% AUC on DRIVE.</p></div><div><h3>Conclusions:</h3><p>The scheme of HSAM+PRA+MDP+MDUP has been proven to be a remarkable improvement and leaves room for further research.</p></div>","PeriodicalId":50633,"journal":{"name":"Computer Vision and Image Understanding","volume":null,"pages":null},"PeriodicalIF":4.3000,"publicationDate":"2024-09-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S1077314224002327/pdfft?md5=8776295cbe51596acb5f3c2feb76b9bf&pid=1-s2.0-S1077314224002327-main.pdf","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Computer Vision and Image Understanding","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S1077314224002327","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 0
Abstract
Objectives:
Accurate extraction of regions of interest (ROI) with variable shapes and scales is one of the primary challenges in medical image segmentation. Current U-based networks mostly aggregate multi-stage encoding outputs as an improved multi-scale skip connection. Although this design has been proven to provide scale diversity and contextual integrity, there remain several intuitive limits: (i) the encoding outputs are resampled to the same size simply, which destruct the fine-grained information. The advantages of utilization of multiple scales are insufficient. (ii) Certain redundant information proportional to the feature dimension size is introduced and causes multi-stage interference. And (iii) the precision of information delivery relies on the up-sampling and down-sampling layers, but guidance on maintaining consistency in feature locations and trends between them is lacking.
Methods:
To improve these situations, this paper proposed a U-based CNN network named HAD-Net, by assembling a new hyper-scale shifted aggregating module (HSAM) paradigm and progressive reusing attention (PRA) for skip connections, as well as employing a novel pair of dual-branch parameter-free sampling layers, i.e. max-diagonal pooling (MDP) and max-diagonal un-pooling (MDUP). That is, the aggregating scheme additionally combines five subregions with certain offsets in the shallower stage. Since the lower scale-down ratios of subregions enrich scales and fine-grain context. Then, the attention scheme contains a partial-to-global channel attention (PGCA) and a multi-scale reusing spatial attention (MRSA), it builds reusing connections internally and adjusts the focus on more useful dimensions. Finally, MDP and MDUP are explored in pairs to improve texture delivery and feature consistency, enhancing information retention and avoiding positional confusion.
Results:
Compared to state-of-the-art networks, HAD-Net has achieved comparable and even better performances with Dice of 90.13%, 81.51%, and 75.43% for each class on BraTS20, 89.59% Dice and 98.56% AUC on Kvasir-SEG, as well as 82.17% Dice and 98.05% AUC on DRIVE.
Conclusions:
The scheme of HSAM+PRA+MDP+MDUP has been proven to be a remarkable improvement and leaves room for further research.
期刊介绍:
The central focus of this journal is the computer analysis of pictorial information. Computer Vision and Image Understanding publishes papers covering all aspects of image analysis from the low-level, iconic processes of early vision to the high-level, symbolic processes of recognition and interpretation. A wide range of topics in the image understanding area is covered, including papers offering insights that differ from predominant views.
Research Areas Include:
• Theory
• Early vision
• Data structures and representations
• Shape
• Range
• Motion
• Matching and recognition
• Architecture and languages
• Vision systems