DSSINet: Dynamic token selection and spatial interaction framework for medical image segmentation

IF 6.8 2区工程技术 Q1 ENGINEERING, MULTIDISCIPLINARY

alexandria engineering journal Pub Date : 2025-10-01 DOI:10.1016/j.aej.2025.09.065

Xiaoyu Pan , Weilin Xu , Hanyun Xu

{"title":"DSSINet: Dynamic token selection and spatial interaction framework for medical image segmentation","authors":"Xiaoyu Pan , Weilin Xu , Hanyun Xu","doi":"10.1016/j.aej.2025.09.065","DOIUrl":null,"url":null,"abstract":"<div><div>Medical image segmentation plays a critical role in clinical diagnosis; however, most existing approaches struggle to balance local detail and global context, and they often fail to fully exploit multi-scale features. In this work, we introduce <strong>DSSINet</strong>, a novel framework that integrates dynamic token selection with spatial interaction to overcome these challenges. To alleviate the high computational cost and background interference inherent to Transformer-based pixel-level self-attention, we develop the Bi-Level Routing Attention (BLRA) module, which performs sparse routing at both region and token levels to focus exclusively on the top-<span><math><mi>k</mi></math></span> most relevant areas. To enhance the capture of fine-grained semantic details, we design the Feature Depth Fusion (FDF) module, establishing implicit spatial correlations between low- and high-level features to enrich multi-scale representations. We further incorporate the Adjacent Domain Perception Module (ADPM) to maintain intra-layer consistency and generate auxiliary edge maps that guide the decoding process. Finally, leveraging multi-graph inference, the Reverse Graph Decomposition (RGD) module iteratively reconstructs representations in a coarse-to-fine manner, yielding precisely refined boundaries. Extensive experiments on five public benchmarks demonstrate the effectiveness of DSSINet, achieving IoU scores of <strong>86.21%</strong> (ISIC 2016), <strong>85.25%</strong> (CVC-ClinicDB), <strong>85.30%</strong> (Kvasir-SEG), <strong>92.51%</strong> (Chest X-ray), and <strong>94.26%</strong> (REFUGE2), outperforming state-of-the-art baselines. Moreover, DSSINet significantly reduces computational costs, with <strong>82.2%</strong> fewer FLOPs and <strong>13.1%</strong> fewer parameters compared to UNet plus.</div></div>","PeriodicalId":7484,"journal":{"name":"alexandria engineering journal","volume":"130 ","pages":"Pages 954-968"},"PeriodicalIF":6.8000,"publicationDate":"2025-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"alexandria engineering journal","FirstCategoryId":"5","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S1110016825010294","RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"ENGINEERING, MULTIDISCIPLINARY","Score":null,"Total":0}

引用次数: 0

Abstract

Medical image segmentation plays a critical role in clinical diagnosis; however, most existing approaches struggle to balance local detail and global context, and they often fail to fully exploit multi-scale features. In this work, we introduce DSSINet, a novel framework that integrates dynamic token selection with spatial interaction to overcome these challenges. To alleviate the high computational cost and background interference inherent to Transformer-based pixel-level self-attention, we develop the Bi-Level Routing Attention (BLRA) module, which performs sparse routing at both region and token levels to focus exclusively on the top-

k

most relevant areas. To enhance the capture of fine-grained semantic details, we design the Feature Depth Fusion (FDF) module, establishing implicit spatial correlations between low- and high-level features to enrich multi-scale representations. We further incorporate the Adjacent Domain Perception Module (ADPM) to maintain intra-layer consistency and generate auxiliary edge maps that guide the decoding process. Finally, leveraging multi-graph inference, the Reverse Graph Decomposition (RGD) module iteratively reconstructs representations in a coarse-to-fine manner, yielding precisely refined boundaries. Extensive experiments on five public benchmarks demonstrate the effectiveness of DSSINet, achieving IoU scores of 86.21% (ISIC 2016), 85.25% (CVC-ClinicDB), 85.30% (Kvasir-SEG), 92.51% (Chest X-ray), and 94.26% (REFUGE2), outperforming state-of-the-art baselines. Moreover, DSSINet significantly reduces computational costs, with 82.2% fewer FLOPs and 13.1% fewer parameters compared to UNet plus.

查看原文本刊更多论文

DSSINet：用于医学图像分割的动态标记选择和空间交互框架

医学图像分割在临床诊断中起着至关重要的作用；然而，大多数现有方法都难以平衡局部细节和全局背景，并且往往无法充分利用多尺度特征。在这项工作中，我们介绍了DSSINet，这是一个集成了动态令牌选择和空间交互的新框架，以克服这些挑战。为了减轻基于变压器的像素级自关注所固有的高计算成本和背景干扰，我们开发了双级路由关注（BLRA）模块，该模块在区域和令牌级别执行稀疏路由，以专门关注最相关的k个区域。为了增强对细粒度语义细节的捕获，我们设计了特征深度融合（FDF）模块，在低阶和高阶特征之间建立隐式空间关联，以丰富多尺度表示。我们进一步结合邻域感知模块（ADPM）来保持层内一致性，并生成辅助边缘映射来指导解码过程。最后，利用多图推理，反向图分解（RGD）模块以一种从粗到精的方式迭代地重建表示，产生精确细化的边界。在五个公共基准上进行的广泛实验证明了DSSINet的有效性，IoU得分为86.21% (ISIC 2016), 85.25% (CVC-ClinicDB), 85.30% (Kvasir-SEG), 92.51%（胸部x线）和94.26%(1)，优于最先进的基线。此外，与UNet plus相比，DSSINet显著降低了计算成本，FLOPs减少了82.2%，参数减少了13.1%。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

alexandria engineering journal Engineering-General Engineering

CiteScore

11.20

自引率

4.40%

发文量

1015

审稿时长

43 days

期刊介绍： Alexandria Engineering Journal is an international journal devoted to publishing high quality papers in the field of engineering and applied science. Alexandria Engineering Journal is cited in the Engineering Information Services (EIS) and the Chemical Abstracts (CA). The papers published in Alexandria Engineering Journal are grouped into five sections, according to the following classification: • Mechanical, Production, Marine and Textile Engineering • Electrical Engineering, Computer Science and Nuclear Engineering • Civil and Architecture Engineering • Chemical Engineering and Applied Sciences • Environmental Engineering