Tao Wang , Weijie Wang , Fausto Giunchiglia , Fengzhi Zhao , Ye Zhang , Duo Yu , Guixia Liu
{"title":"MBT-Polyp:一种新的用于息肉分割的多分支内存增强变压器","authors":"Tao Wang , Weijie Wang , Fausto Giunchiglia , Fengzhi Zhao , Ye Zhang , Duo Yu , Guixia Liu","doi":"10.1016/j.imavis.2025.105747","DOIUrl":null,"url":null,"abstract":"<div><div>Polyp segmentation plays a critical role in the early diagnosis and precise clinical intervention of colorectal cancer (CRC). Despite significant advancements in deep learning for medical image segmentation, accurate localization of small polyps and precise delineation of polyp boundaries remain challenges in colorectal polyp segmentation. In this study, we introduce MBT-Polyp, a Multi-branch Memory-augmented Transformer architecture designed to improve segmentation sensitivity for small polyps and enhance the delineation accuracy of ambiguous polyp boundaries. At the core of our framework is MemoryFormer, a Transformer-based U-shaped architecture that incorporates three key components: a Dynamic Focal Attention block (DFA) for efficient small target enhancement and edge refinement, a High-Level Memory Attention Module (HMAM) for preserving boundary details via cross-resolution fusion, and a Multi-View Channel Memory Attention Module (MCMAM) for suppressing background noise and modeling local spatial context. To guide specialized learning, we derive small polyp and edge labels alongside ground truth, enabling MemoryFormer to process them through dedicated branches. The outputs are fused using a Small Polyp Fusion Strategy (SPFS) and an Edge Correction Strategy (ECS) to alleviate over- and under-segmentation. The quantitative results on Kvasir-SEG, CVC-ColonDB, CVC-ClinicDB, CVC-300, and ETIS-Larib yield mean Dice scores of 0.930, 0.818, 0.943, 0.912, and 0.763, respectively, demonstrating strong generalization across diverse polyp segmentation scenarios. Code and datasets are available at: <span><span>https://github.com/taojlu/PolypSeg</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":50374,"journal":{"name":"Image and Vision Computing","volume":"163 ","pages":"Article 105747"},"PeriodicalIF":4.2000,"publicationDate":"2025-09-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"MBT-Polyp: A new Multi-Branch Memory-augmented Transformer for polyp segmentation\",\"authors\":\"Tao Wang , Weijie Wang , Fausto Giunchiglia , Fengzhi Zhao , Ye Zhang , Duo Yu , Guixia Liu\",\"doi\":\"10.1016/j.imavis.2025.105747\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>Polyp segmentation plays a critical role in the early diagnosis and precise clinical intervention of colorectal cancer (CRC). Despite significant advancements in deep learning for medical image segmentation, accurate localization of small polyps and precise delineation of polyp boundaries remain challenges in colorectal polyp segmentation. In this study, we introduce MBT-Polyp, a Multi-branch Memory-augmented Transformer architecture designed to improve segmentation sensitivity for small polyps and enhance the delineation accuracy of ambiguous polyp boundaries. At the core of our framework is MemoryFormer, a Transformer-based U-shaped architecture that incorporates three key components: a Dynamic Focal Attention block (DFA) for efficient small target enhancement and edge refinement, a High-Level Memory Attention Module (HMAM) for preserving boundary details via cross-resolution fusion, and a Multi-View Channel Memory Attention Module (MCMAM) for suppressing background noise and modeling local spatial context. To guide specialized learning, we derive small polyp and edge labels alongside ground truth, enabling MemoryFormer to process them through dedicated branches. The outputs are fused using a Small Polyp Fusion Strategy (SPFS) and an Edge Correction Strategy (ECS) to alleviate over- and under-segmentation. The quantitative results on Kvasir-SEG, CVC-ColonDB, CVC-ClinicDB, CVC-300, and ETIS-Larib yield mean Dice scores of 0.930, 0.818, 0.943, 0.912, and 0.763, respectively, demonstrating strong generalization across diverse polyp segmentation scenarios. Code and datasets are available at: <span><span>https://github.com/taojlu/PolypSeg</span><svg><path></path></svg></span>.</div></div>\",\"PeriodicalId\":50374,\"journal\":{\"name\":\"Image and Vision Computing\",\"volume\":\"163 \",\"pages\":\"Article 105747\"},\"PeriodicalIF\":4.2000,\"publicationDate\":\"2025-09-27\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Image and Vision Computing\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S026288562500335X\",\"RegionNum\":3,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Image and Vision Computing","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S026288562500335X","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
MBT-Polyp: A new Multi-Branch Memory-augmented Transformer for polyp segmentation
Polyp segmentation plays a critical role in the early diagnosis and precise clinical intervention of colorectal cancer (CRC). Despite significant advancements in deep learning for medical image segmentation, accurate localization of small polyps and precise delineation of polyp boundaries remain challenges in colorectal polyp segmentation. In this study, we introduce MBT-Polyp, a Multi-branch Memory-augmented Transformer architecture designed to improve segmentation sensitivity for small polyps and enhance the delineation accuracy of ambiguous polyp boundaries. At the core of our framework is MemoryFormer, a Transformer-based U-shaped architecture that incorporates three key components: a Dynamic Focal Attention block (DFA) for efficient small target enhancement and edge refinement, a High-Level Memory Attention Module (HMAM) for preserving boundary details via cross-resolution fusion, and a Multi-View Channel Memory Attention Module (MCMAM) for suppressing background noise and modeling local spatial context. To guide specialized learning, we derive small polyp and edge labels alongside ground truth, enabling MemoryFormer to process them through dedicated branches. The outputs are fused using a Small Polyp Fusion Strategy (SPFS) and an Edge Correction Strategy (ECS) to alleviate over- and under-segmentation. The quantitative results on Kvasir-SEG, CVC-ColonDB, CVC-ClinicDB, CVC-300, and ETIS-Larib yield mean Dice scores of 0.930, 0.818, 0.943, 0.912, and 0.763, respectively, demonstrating strong generalization across diverse polyp segmentation scenarios. Code and datasets are available at: https://github.com/taojlu/PolypSeg.
期刊介绍:
Image and Vision Computing has as a primary aim the provision of an effective medium of interchange for the results of high quality theoretical and applied research fundamental to all aspects of image interpretation and computer vision. The journal publishes work that proposes new image interpretation and computer vision methodology or addresses the application of such methods to real world scenes. It seeks to strengthen a deeper understanding in the discipline by encouraging the quantitative comparison and performance evaluation of the proposed methodology. The coverage includes: image interpretation, scene modelling, object recognition and tracking, shape analysis, monitoring and surveillance, active vision and robotic systems, SLAM, biologically-inspired computer vision, motion analysis, stereo vision, document image understanding, character and handwritten text recognition, face and gesture recognition, biometrics, vision-based human-computer interaction, human activity and behavior understanding, data fusion from multiple sensor inputs, image databases.