Zhiyong Huang , Shiyao Zhou , Zhi Yu , Mingyang Hou , Zhiyu Zhao , Xiaoyu Li , Jiahong Wang , Yan Yan , Yushi Liu , Hans Gregersen
{"title":"Multi-scale interaction and locally enhanced bridging network for medical image segmentation","authors":"Zhiyong Huang , Shiyao Zhou , Zhi Yu , Mingyang Hou , Zhiyu Zhao , Xiaoyu Li , Jiahong Wang , Yan Yan , Yushi Liu , Hans Gregersen","doi":"10.1016/j.compmedimag.2025.102610","DOIUrl":null,"url":null,"abstract":"<div><div>Accurate organ segmentation is crucial for precise medical diagnosis. Recent methods in CNNs and Transformers have significantly enhanced automatic medical image segmentation. Their encoders and decoders often rely on simple skip connections, which fail to effectively integrate multi-scale features. This causes a misalignment between low-resolution global features and high-resolution spatial information. As a result, segmentation accuracy suffers, particularly in global contours and local details. To address this limitation, MILENet, a multi-scale interaction and locally enhanced bridging network, is proposed. The proposed context bridge incorporates a multi-scale interaction module to reorganize multi-scale features and ensure global correlation. Additionally, a local enhancement module is introduced. It includes a dilated coordinate attention mechanism and a locally enhanced FFN built with a cascaded convolutional structure. This module enhances local context modeling and improves feature discrimination. Furthermore, a source-driven connection mechanism is introduced to preserve detailed information across layers, providing richer features for decoder reconstruction. By leveraging these innovations, MILENet effectively aligns multi-scale features and enhances local details, thereby improving segmentation accuracy. MILENet has been evaluated on publicly available datasets spanning abdominal CT (Synapse), cardiac MRI (ACDC), and colonoscopy RGB images (Kvasir, CVC-ClinicDB, CVC-ColonDB, CVC-300, and ETIS-LaribDB). The results show that MILENet achieves state-of-the-art performance across different modalities. It effectively handles both large-organ segmentation in CT/MRI and fine-grained polyp delineation in endoscopic images, demonstrating strong generalizability to diverse anatomical structures and imaging conditions. The code has been released on GitHub: <span><span>https://github.com/syzhou1226/MILENET</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":50631,"journal":{"name":"Computerized Medical Imaging and Graphics","volume":"124 ","pages":"Article 102610"},"PeriodicalIF":4.9000,"publicationDate":"2025-07-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Computerized Medical Imaging and Graphics","FirstCategoryId":"5","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0895611125001193","RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"ENGINEERING, BIOMEDICAL","Score":null,"Total":0}
引用次数: 0
Abstract
Accurate organ segmentation is crucial for precise medical diagnosis. Recent methods in CNNs and Transformers have significantly enhanced automatic medical image segmentation. Their encoders and decoders often rely on simple skip connections, which fail to effectively integrate multi-scale features. This causes a misalignment between low-resolution global features and high-resolution spatial information. As a result, segmentation accuracy suffers, particularly in global contours and local details. To address this limitation, MILENet, a multi-scale interaction and locally enhanced bridging network, is proposed. The proposed context bridge incorporates a multi-scale interaction module to reorganize multi-scale features and ensure global correlation. Additionally, a local enhancement module is introduced. It includes a dilated coordinate attention mechanism and a locally enhanced FFN built with a cascaded convolutional structure. This module enhances local context modeling and improves feature discrimination. Furthermore, a source-driven connection mechanism is introduced to preserve detailed information across layers, providing richer features for decoder reconstruction. By leveraging these innovations, MILENet effectively aligns multi-scale features and enhances local details, thereby improving segmentation accuracy. MILENet has been evaluated on publicly available datasets spanning abdominal CT (Synapse), cardiac MRI (ACDC), and colonoscopy RGB images (Kvasir, CVC-ClinicDB, CVC-ColonDB, CVC-300, and ETIS-LaribDB). The results show that MILENet achieves state-of-the-art performance across different modalities. It effectively handles both large-organ segmentation in CT/MRI and fine-grained polyp delineation in endoscopic images, demonstrating strong generalizability to diverse anatomical structures and imaging conditions. The code has been released on GitHub: https://github.com/syzhou1226/MILENET.
期刊介绍:
The purpose of the journal Computerized Medical Imaging and Graphics is to act as a source for the exchange of research results concerning algorithmic advances, development, and application of digital imaging in disease detection, diagnosis, intervention, prevention, precision medicine, and population health. Included in the journal will be articles on novel computerized imaging or visualization techniques, including artificial intelligence and machine learning, augmented reality for surgical planning and guidance, big biomedical data visualization, computer-aided diagnosis, computerized-robotic surgery, image-guided therapy, imaging scanning and reconstruction, mobile and tele-imaging, radiomics, and imaging integration and modeling with other information relevant to digital health. The types of biomedical imaging include: magnetic resonance, computed tomography, ultrasound, nuclear medicine, X-ray, microwave, optical and multi-photon microscopy, video and sensory imaging, and the convergence of biomedical images with other non-imaging datasets.