{"title":"Efficient Video Polyp Segmentation by Deformable Alignment and Local Attention.","authors":"Yifei Zhao, Xiaoying Wang, Junping Yin","doi":"10.1109/JBHI.2025.3592897","DOIUrl":null,"url":null,"abstract":"<p><p>Accurate and efficient Video Polyp Segmentation (VPS) is vital for the early detection of colorectal cancer and the effective treatment of polyps. However, achieving this remains highly challenging due to the inherent difficulty in modeling the spatial-temporal relationships within colonoscopy videos. Existing methods that directly associate video frames frequently fail to account for variations in polyp or background motion, leading to excessive noise and reduced segmentation accuracy. Conversely, approaches that rely on optical flow models to estimate motion and align frames incur significant computational overhead. To address these limitations, we propose a novel VPS framework, termed Deformable Alignment and Local Attention (DALA). In this framework, we first construct a shared encoder to jointly encode the feature representations of paired video frames. Subsequently, we introduce a Multi-Scale Frame Alignment (MSFA) module based on deformable convolution to estimate the motion between reference and anchor frames. The multi-scale architecture is designed to accommodate the scale variations of polyps arising from differing viewing angles and speeds during colonoscopy. Furthermore, Local Attention (LA) is employed to selectively aggregate the aligned features, yielding more precise spatial-temporal feature representations. Extensive experiments conducted on the challenging SUN-SEG dataset and PolypGen dataset demonstrate that DALA achieves superior performance compared to stateof-the-art models. The code will be publicly available at https://github.com/xff12138/DALA.</p>","PeriodicalId":13073,"journal":{"name":"IEEE Journal of Biomedical and Health Informatics","volume":"PP ","pages":""},"PeriodicalIF":6.8000,"publicationDate":"2025-07-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Journal of Biomedical and Health Informatics","FirstCategoryId":"5","ListUrlMain":"https://doi.org/10.1109/JBHI.2025.3592897","RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}
引用次数: 0
Abstract
Accurate and efficient Video Polyp Segmentation (VPS) is vital for the early detection of colorectal cancer and the effective treatment of polyps. However, achieving this remains highly challenging due to the inherent difficulty in modeling the spatial-temporal relationships within colonoscopy videos. Existing methods that directly associate video frames frequently fail to account for variations in polyp or background motion, leading to excessive noise and reduced segmentation accuracy. Conversely, approaches that rely on optical flow models to estimate motion and align frames incur significant computational overhead. To address these limitations, we propose a novel VPS framework, termed Deformable Alignment and Local Attention (DALA). In this framework, we first construct a shared encoder to jointly encode the feature representations of paired video frames. Subsequently, we introduce a Multi-Scale Frame Alignment (MSFA) module based on deformable convolution to estimate the motion between reference and anchor frames. The multi-scale architecture is designed to accommodate the scale variations of polyps arising from differing viewing angles and speeds during colonoscopy. Furthermore, Local Attention (LA) is employed to selectively aggregate the aligned features, yielding more precise spatial-temporal feature representations. Extensive experiments conducted on the challenging SUN-SEG dataset and PolypGen dataset demonstrate that DALA achieves superior performance compared to stateof-the-art models. The code will be publicly available at https://github.com/xff12138/DALA.
期刊介绍:
IEEE Journal of Biomedical and Health Informatics publishes original papers presenting recent advances where information and communication technologies intersect with health, healthcare, life sciences, and biomedicine. Topics include acquisition, transmission, storage, retrieval, management, and analysis of biomedical and health information. The journal covers applications of information technologies in healthcare, patient monitoring, preventive care, early disease diagnosis, therapy discovery, and personalized treatment protocols. It explores electronic medical and health records, clinical information systems, decision support systems, medical and biological imaging informatics, wearable systems, body area/sensor networks, and more. Integration-related topics like interoperability, evidence-based medicine, and secure patient data are also addressed.