HAD-Net：基于注意力 U 的网络，采用超尺度移动聚合和最大对角线采样，用于医学图像分割

IF 4.3 3区计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Computer Vision and Image Understanding Pub Date : 2024-09-07 DOI:10.1016/j.cviu.2024.104151

Junding Sun , Yabei Li , Xiaosheng Wu , Chaosheng Tang , Shuihua Wang , Yudong Zhang

{"title":"HAD-Net：基于注意力 U 的网络，采用超尺度移动聚合和最大对角线采样，用于医学图像分割","authors":"Junding Sun , Yabei Li , Xiaosheng Wu , Chaosheng Tang , Shuihua Wang , Yudong Zhang","doi":"10.1016/j.cviu.2024.104151","DOIUrl":null,"url":null,"abstract":"<div><h3>Objectives:</h3><p>Accurate extraction of regions of interest (ROI) with variable shapes and scales is one of the primary challenges in medical image segmentation. Current U-based networks mostly aggregate multi-stage encoding outputs as an improved multi-scale skip connection. Although this design has been proven to provide scale diversity and contextual integrity, there remain several intuitive limits: <strong>(i)</strong> the encoding outputs are resampled to the same size simply, which destruct the fine-grained information. The advantages of utilization of multiple scales are insufficient. <strong>(ii)</strong> Certain redundant information proportional to the feature dimension size is introduced and causes multi-stage interference. And <strong>(iii)</strong> the precision of information delivery relies on the up-sampling and down-sampling layers, but guidance on maintaining consistency in feature locations and trends between them is lacking.</p></div><div><h3>Methods:</h3><p>To improve these situations, this paper proposed a U-based CNN network named HAD-Net, by assembling a new hyper-scale shifted aggregating module (HSAM) paradigm and progressive reusing attention (PRA) for skip connections, as well as employing a novel pair of dual-branch parameter-free sampling layers, i.e. max-diagonal pooling (MDP) and max-diagonal un-pooling (MDUP). That is, the aggregating scheme additionally combines five subregions with certain offsets in the shallower stage. Since the lower scale-down ratios of subregions enrich scales and fine-grain context. Then, the attention scheme contains a partial-to-global channel attention (PGCA) and a multi-scale reusing spatial attention (MRSA), it builds reusing connections internally and adjusts the focus on more useful dimensions. Finally, MDP and MDUP are explored in pairs to improve texture delivery and feature consistency, enhancing information retention and avoiding positional confusion.</p></div><div><h3>Results:</h3><p>Compared to state-of-the-art networks, HAD-Net has achieved comparable and even better performances with Dice of 90.13%, 81.51%, and 75.43% for each class on BraTS20, 89.59% Dice and 98.56% AUC on Kvasir-SEG, as well as 82.17% Dice and 98.05% AUC on DRIVE.</p></div><div><h3>Conclusions:</h3><p>The scheme of HSAM+PRA+MDP+MDUP has been proven to be a remarkable improvement and leaves room for further research.</p></div>","PeriodicalId":50633,"journal":{"name":"Computer Vision and Image Understanding","volume":"249 ","pages":"Article 104151"},"PeriodicalIF":4.3000,"publicationDate":"2024-09-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S1077314224002327/pdfft?md5=8776295cbe51596acb5f3c2feb76b9bf&pid=1-s2.0-S1077314224002327-main.pdf","citationCount":"0","resultStr":"{\"title\":\"HAD-Net: An attention U-based network with hyper-scale shifted aggregating and max-diagonal sampling for medical image segmentation\",\"authors\":\"Junding Sun , Yabei Li , Xiaosheng Wu , Chaosheng Tang , Shuihua Wang , Yudong Zhang\",\"doi\":\"10.1016/j.cviu.2024.104151\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><h3>Objectives:</h3><p>Accurate extraction of regions of interest (ROI) with variable shapes and scales is one of the primary challenges in medical image segmentation. Current U-based networks mostly aggregate multi-stage encoding outputs as an improved multi-scale skip connection. Although this design has been proven to provide scale diversity and contextual integrity, there remain several intuitive limits: <strong>(i)</strong> the encoding outputs are resampled to the same size simply, which destruct the fine-grained information. The advantages of utilization of multiple scales are insufficient. <strong>(ii)</strong> Certain redundant information proportional to the feature dimension size is introduced and causes multi-stage interference. And <strong>(iii)</strong> the precision of information delivery relies on the up-sampling and down-sampling layers, but guidance on maintaining consistency in feature locations and trends between them is lacking.</p></div><div><h3>Methods:</h3><p>To improve these situations, this paper proposed a U-based CNN network named HAD-Net, by assembling a new hyper-scale shifted aggregating module (HSAM) paradigm and progressive reusing attention (PRA) for skip connections, as well as employing a novel pair of dual-branch parameter-free sampling layers, i.e. max-diagonal pooling (MDP) and max-diagonal un-pooling (MDUP). That is, the aggregating scheme additionally combines five subregions with certain offsets in the shallower stage. Since the lower scale-down ratios of subregions enrich scales and fine-grain context. Then, the attention scheme contains a partial-to-global channel attention (PGCA) and a multi-scale reusing spatial attention (MRSA), it builds reusing connections internally and adjusts the focus on more useful dimensions. Finally, MDP and MDUP are explored in pairs to improve texture delivery and feature consistency, enhancing information retention and avoiding positional confusion.</p></div><div><h3>Results:</h3><p>Compared to state-of-the-art networks, HAD-Net has achieved comparable and even better performances with Dice of 90.13%, 81.51%, and 75.43% for each class on BraTS20, 89.59% Dice and 98.56% AUC on Kvasir-SEG, as well as 82.17% Dice and 98.05% AUC on DRIVE.</p></div><div><h3>Conclusions:</h3><p>The scheme of HSAM+PRA+MDP+MDUP has been proven to be a remarkable improvement and leaves room for further research.</p></div>\",\"PeriodicalId\":50633,\"journal\":{\"name\":\"Computer Vision and Image Understanding\",\"volume\":\"249 \",\"pages\":\"Article 104151\"},\"PeriodicalIF\":4.3000,\"publicationDate\":\"2024-09-07\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.sciencedirect.com/science/article/pii/S1077314224002327/pdfft?md5=8776295cbe51596acb5f3c2feb76b9bf&pid=1-s2.0-S1077314224002327-main.pdf\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Computer Vision and Image Understanding\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S1077314224002327\",\"RegionNum\":3,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Computer Vision and Image Understanding","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S1077314224002327","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}

引用次数: 0

摘要

目标：准确提取具有不同形状和尺度的感兴趣区（ROI）是医学图像分割的主要挑战之一。目前基于 U 的网络大多将多级编码输出汇总为改进的多尺度跳转连接。虽然这种设计已被证明能提供尺度多样性和上下文完整性，但仍存在一些直观限制：(i) 编码输出被简单地重新采样到相同大小，从而破坏了细粒度信息。利用多尺度的优势并不充分。(ii) 某些与特征维度大小成正比的冗余信息被引入，造成多级干扰。(iii) 信息传递的精确度依赖于上采样层和下采样层，但它们之间缺乏保持特征位置和趋势一致性的指导。方法：为了改善这些情况，本文提出了一种基于 U 的 CNN 网络，命名为 HAD-Net，它集合了一种新的超大规模移位聚合模块（HSAM）范式和用于跳过连接的渐进重用注意力（PRA），并采用了一对新颖的双分支无参数采样层，即最大对角线池化（MDP）和最大对角线非池化（MDUP）。也就是说，该汇集方案在较浅的阶段额外合并了五个具有一定偏移的子区域。由于子区域的缩放比例较低，可以丰富尺度和细粒度背景。然后，注意力方案包含部分到全局通道注意力（PGCA）和多尺度重用空间注意力（MRSA），它在内部建立重用连接，并将重点调整到更有用的维度上。结果：与最先进的网络相比，HAD-Net 的性能相当甚至更好，其 Dice 分别为 90.结论：事实证明，HSAM+PRA+MDP+MDUP 方案具有显著的改进效果，并留有进一步研究的空间。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

HAD-Net: An attention U-based network with hyper-scale shifted aggregating and max-diagonal sampling for medical image segmentation

Objectives:

Accurate extraction of regions of interest (ROI) with variable shapes and scales is one of the primary challenges in medical image segmentation. Current U-based networks mostly aggregate multi-stage encoding outputs as an improved multi-scale skip connection. Although this design has been proven to provide scale diversity and contextual integrity, there remain several intuitive limits: (i) the encoding outputs are resampled to the same size simply, which destruct the fine-grained information. The advantages of utilization of multiple scales are insufficient. (ii) Certain redundant information proportional to the feature dimension size is introduced and causes multi-stage interference. And (iii) the precision of information delivery relies on the up-sampling and down-sampling layers, but guidance on maintaining consistency in feature locations and trends between them is lacking.

Methods:

To improve these situations, this paper proposed a U-based CNN network named HAD-Net, by assembling a new hyper-scale shifted aggregating module (HSAM) paradigm and progressive reusing attention (PRA) for skip connections, as well as employing a novel pair of dual-branch parameter-free sampling layers, i.e. max-diagonal pooling (MDP) and max-diagonal un-pooling (MDUP). That is, the aggregating scheme additionally combines five subregions with certain offsets in the shallower stage. Since the lower scale-down ratios of subregions enrich scales and fine-grain context. Then, the attention scheme contains a partial-to-global channel attention (PGCA) and a multi-scale reusing spatial attention (MRSA), it builds reusing connections internally and adjusts the focus on more useful dimensions. Finally, MDP and MDUP are explored in pairs to improve texture delivery and feature consistency, enhancing information retention and avoiding positional confusion.

Results:

Compared to state-of-the-art networks, HAD-Net has achieved comparable and even better performances with Dice of 90.13%, 81.51%, and 75.43% for each class on BraTS20, 89.59% Dice and 98.56% AUC on Kvasir-SEG, as well as 82.17% Dice and 98.05% AUC on DRIVE.

Conclusions:

The scheme of HSAM+PRA+MDP+MDUP has been proven to be a remarkable improvement and leaves room for further research.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Computer Vision and Image Understanding 工程技术-工程：电子与电气

CiteScore

7.80

自引率

4.40%

发文量

112

审稿时长

79 days

期刊介绍： The central focus of this journal is the computer analysis of pictorial information. Computer Vision and Image Understanding publishes papers covering all aspects of image analysis from the low-level, iconic processes of early vision to the high-level, symbolic processes of recognition and interpretation. A wide range of topics in the image understanding area is covered, including papers offering insights that differ from predominant views. Research Areas Include: • Theory • Early vision • Data structures and representations • Shape • Range • Motion • Matching and recognition • Architecture and languages • Vision systems