{"title":"DSSINet: Dynamic token selection and spatial interaction framework for medical image segmentation","authors":"Xiaoyu Pan , Weilin Xu , Hanyun Xu","doi":"10.1016/j.aej.2025.09.065","DOIUrl":null,"url":null,"abstract":"<div><div>Medical image segmentation plays a critical role in clinical diagnosis; however, most existing approaches struggle to balance local detail and global context, and they often fail to fully exploit multi-scale features. In this work, we introduce <strong>DSSINet</strong>, a novel framework that integrates dynamic token selection with spatial interaction to overcome these challenges. To alleviate the high computational cost and background interference inherent to Transformer-based pixel-level self-attention, we develop the Bi-Level Routing Attention (BLRA) module, which performs sparse routing at both region and token levels to focus exclusively on the top-<span><math><mi>k</mi></math></span> most relevant areas. To enhance the capture of fine-grained semantic details, we design the Feature Depth Fusion (FDF) module, establishing implicit spatial correlations between low- and high-level features to enrich multi-scale representations. We further incorporate the Adjacent Domain Perception Module (ADPM) to maintain intra-layer consistency and generate auxiliary edge maps that guide the decoding process. Finally, leveraging multi-graph inference, the Reverse Graph Decomposition (RGD) module iteratively reconstructs representations in a coarse-to-fine manner, yielding precisely refined boundaries. Extensive experiments on five public benchmarks demonstrate the effectiveness of DSSINet, achieving IoU scores of <strong>86.21%</strong> (ISIC 2016), <strong>85.25%</strong> (CVC-ClinicDB), <strong>85.30%</strong> (Kvasir-SEG), <strong>92.51%</strong> (Chest X-ray), and <strong>94.26%</strong> (REFUGE2), outperforming state-of-the-art baselines. Moreover, DSSINet significantly reduces computational costs, with <strong>82.2%</strong> fewer FLOPs and <strong>13.1%</strong> fewer parameters compared to UNet plus.</div></div>","PeriodicalId":7484,"journal":{"name":"alexandria engineering journal","volume":"130 ","pages":"Pages 954-968"},"PeriodicalIF":6.8000,"publicationDate":"2025-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"alexandria engineering journal","FirstCategoryId":"5","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S1110016825010294","RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"ENGINEERING, MULTIDISCIPLINARY","Score":null,"Total":0}
引用次数: 0
Abstract
Medical image segmentation plays a critical role in clinical diagnosis; however, most existing approaches struggle to balance local detail and global context, and they often fail to fully exploit multi-scale features. In this work, we introduce DSSINet, a novel framework that integrates dynamic token selection with spatial interaction to overcome these challenges. To alleviate the high computational cost and background interference inherent to Transformer-based pixel-level self-attention, we develop the Bi-Level Routing Attention (BLRA) module, which performs sparse routing at both region and token levels to focus exclusively on the top- most relevant areas. To enhance the capture of fine-grained semantic details, we design the Feature Depth Fusion (FDF) module, establishing implicit spatial correlations between low- and high-level features to enrich multi-scale representations. We further incorporate the Adjacent Domain Perception Module (ADPM) to maintain intra-layer consistency and generate auxiliary edge maps that guide the decoding process. Finally, leveraging multi-graph inference, the Reverse Graph Decomposition (RGD) module iteratively reconstructs representations in a coarse-to-fine manner, yielding precisely refined boundaries. Extensive experiments on five public benchmarks demonstrate the effectiveness of DSSINet, achieving IoU scores of 86.21% (ISIC 2016), 85.25% (CVC-ClinicDB), 85.30% (Kvasir-SEG), 92.51% (Chest X-ray), and 94.26% (REFUGE2), outperforming state-of-the-art baselines. Moreover, DSSINet significantly reduces computational costs, with 82.2% fewer FLOPs and 13.1% fewer parameters compared to UNet plus.
期刊介绍:
Alexandria Engineering Journal is an international journal devoted to publishing high quality papers in the field of engineering and applied science. Alexandria Engineering Journal is cited in the Engineering Information Services (EIS) and the Chemical Abstracts (CA). The papers published in Alexandria Engineering Journal are grouped into five sections, according to the following classification:
• Mechanical, Production, Marine and Textile Engineering
• Electrical Engineering, Computer Science and Nuclear Engineering
• Civil and Architecture Engineering
• Chemical Engineering and Applied Sciences
• Environmental Engineering