{"title":"CIFFormer:一个上下文信息流导向的结肠直肠息肉分割转换器","authors":"Cunlu Xu , Long Lin , Bin Wang , Jun Liu","doi":"10.1016/j.neucom.2025.130413","DOIUrl":null,"url":null,"abstract":"<div><div>Automatic segmentation of polyps in endoscopic images plays a critical role in the early diagnosis of colorectal cancer. In recent years, Visual Transformers, especially pyramid vision transformers, have achieved remarkable strides and become dominating methods in polyp segmentation. However, due to the high resemblance between polyps and normal tissues in terms of size, appearance, color, and other aspects, the pyramid vision transformer methods still face the challenges of the representation of fine-grained details and identifying highly disguised polyps that could be pivotal in precise segmentation of colorectal polyp. To address these challenges, we propose a novel Contextual Information Flow Guided Transformer (CIFFormer) for colorectal polyp segmentation to reconstruct the architecture of a pyramid vision transformer via a contextual information flow design. Our proposed method utilizes a pyramid-structured encoder to obtain multi-resolution feature maps. To effectively capture the target object’s features at various levels of detail, from coarse-grained global information to fine-grained local information, we design a Global-Local Feature Synergy Fusion module (GLFS). GLFS adopts a progressive fusion strategy, first fusing the features of adjacent hierarchy and then gradually fusing across the hierarchy. This allows the model to utilize the features of different semantic levels better and avoid the information loss caused by direct fusion. In addition, we also introduce a Multi-Density Global-Local Residual Module (MDGL). The multi-density residual units within MDGL improve feature propagation and information flow. By employing both local and global residual learning, the model gains a better ability to capture detailed information at both global and local scales. The experimental results demonstrate that our CIFFormer model surpasses 17 benchmark models and achieves state-of-the-art performance on five popular datasets. Furthermore, our model exhibits remarkable performance on two video datasets as well. The source code of this work is available at <span><span>https://github.com/lonlin404/CIFFormer</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":19268,"journal":{"name":"Neurocomputing","volume":"644 ","pages":"Article 130413"},"PeriodicalIF":5.5000,"publicationDate":"2025-05-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"CIFFormer: A Contextual Information Flow Guided Transformer for colorectal polyp segmentation\",\"authors\":\"Cunlu Xu , Long Lin , Bin Wang , Jun Liu\",\"doi\":\"10.1016/j.neucom.2025.130413\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>Automatic segmentation of polyps in endoscopic images plays a critical role in the early diagnosis of colorectal cancer. In recent years, Visual Transformers, especially pyramid vision transformers, have achieved remarkable strides and become dominating methods in polyp segmentation. However, due to the high resemblance between polyps and normal tissues in terms of size, appearance, color, and other aspects, the pyramid vision transformer methods still face the challenges of the representation of fine-grained details and identifying highly disguised polyps that could be pivotal in precise segmentation of colorectal polyp. To address these challenges, we propose a novel Contextual Information Flow Guided Transformer (CIFFormer) for colorectal polyp segmentation to reconstruct the architecture of a pyramid vision transformer via a contextual information flow design. Our proposed method utilizes a pyramid-structured encoder to obtain multi-resolution feature maps. To effectively capture the target object’s features at various levels of detail, from coarse-grained global information to fine-grained local information, we design a Global-Local Feature Synergy Fusion module (GLFS). GLFS adopts a progressive fusion strategy, first fusing the features of adjacent hierarchy and then gradually fusing across the hierarchy. This allows the model to utilize the features of different semantic levels better and avoid the information loss caused by direct fusion. In addition, we also introduce a Multi-Density Global-Local Residual Module (MDGL). The multi-density residual units within MDGL improve feature propagation and information flow. By employing both local and global residual learning, the model gains a better ability to capture detailed information at both global and local scales. The experimental results demonstrate that our CIFFormer model surpasses 17 benchmark models and achieves state-of-the-art performance on five popular datasets. Furthermore, our model exhibits remarkable performance on two video datasets as well. The source code of this work is available at <span><span>https://github.com/lonlin404/CIFFormer</span><svg><path></path></svg></span>.</div></div>\",\"PeriodicalId\":19268,\"journal\":{\"name\":\"Neurocomputing\",\"volume\":\"644 \",\"pages\":\"Article 130413\"},\"PeriodicalIF\":5.5000,\"publicationDate\":\"2025-05-15\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Neurocomputing\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S0925231225010859\",\"RegionNum\":2,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Neurocomputing","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0925231225010859","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
CIFFormer: A Contextual Information Flow Guided Transformer for colorectal polyp segmentation
Automatic segmentation of polyps in endoscopic images plays a critical role in the early diagnosis of colorectal cancer. In recent years, Visual Transformers, especially pyramid vision transformers, have achieved remarkable strides and become dominating methods in polyp segmentation. However, due to the high resemblance between polyps and normal tissues in terms of size, appearance, color, and other aspects, the pyramid vision transformer methods still face the challenges of the representation of fine-grained details and identifying highly disguised polyps that could be pivotal in precise segmentation of colorectal polyp. To address these challenges, we propose a novel Contextual Information Flow Guided Transformer (CIFFormer) for colorectal polyp segmentation to reconstruct the architecture of a pyramid vision transformer via a contextual information flow design. Our proposed method utilizes a pyramid-structured encoder to obtain multi-resolution feature maps. To effectively capture the target object’s features at various levels of detail, from coarse-grained global information to fine-grained local information, we design a Global-Local Feature Synergy Fusion module (GLFS). GLFS adopts a progressive fusion strategy, first fusing the features of adjacent hierarchy and then gradually fusing across the hierarchy. This allows the model to utilize the features of different semantic levels better and avoid the information loss caused by direct fusion. In addition, we also introduce a Multi-Density Global-Local Residual Module (MDGL). The multi-density residual units within MDGL improve feature propagation and information flow. By employing both local and global residual learning, the model gains a better ability to capture detailed information at both global and local scales. The experimental results demonstrate that our CIFFormer model surpasses 17 benchmark models and achieves state-of-the-art performance on five popular datasets. Furthermore, our model exhibits remarkable performance on two video datasets as well. The source code of this work is available at https://github.com/lonlin404/CIFFormer.
期刊介绍:
Neurocomputing publishes articles describing recent fundamental contributions in the field of neurocomputing. Neurocomputing theory, practice and applications are the essential topics being covered.