CIFFormer：一个上下文信息流导向的结肠直肠息肉分割转换器

IF 5.5 2区计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Neurocomputing Pub Date : 2025-05-15 DOI:10.1016/j.neucom.2025.130413

Cunlu Xu , Long Lin , Bin Wang , Jun Liu

{"title":"CIFFormer：一个上下文信息流导向的结肠直肠息肉分割转换器","authors":"Cunlu Xu , Long Lin , Bin Wang , Jun Liu","doi":"10.1016/j.neucom.2025.130413","DOIUrl":null,"url":null,"abstract":"<div><div>Automatic segmentation of polyps in endoscopic images plays a critical role in the early diagnosis of colorectal cancer. In recent years, Visual Transformers, especially pyramid vision transformers, have achieved remarkable strides and become dominating methods in polyp segmentation. However, due to the high resemblance between polyps and normal tissues in terms of size, appearance, color, and other aspects, the pyramid vision transformer methods still face the challenges of the representation of fine-grained details and identifying highly disguised polyps that could be pivotal in precise segmentation of colorectal polyp. To address these challenges, we propose a novel Contextual Information Flow Guided Transformer (CIFFormer) for colorectal polyp segmentation to reconstruct the architecture of a pyramid vision transformer via a contextual information flow design. Our proposed method utilizes a pyramid-structured encoder to obtain multi-resolution feature maps. To effectively capture the target object’s features at various levels of detail, from coarse-grained global information to fine-grained local information, we design a Global-Local Feature Synergy Fusion module (GLFS). GLFS adopts a progressive fusion strategy, first fusing the features of adjacent hierarchy and then gradually fusing across the hierarchy. This allows the model to utilize the features of different semantic levels better and avoid the information loss caused by direct fusion. In addition, we also introduce a Multi-Density Global-Local Residual Module (MDGL). The multi-density residual units within MDGL improve feature propagation and information flow. By employing both local and global residual learning, the model gains a better ability to capture detailed information at both global and local scales. The experimental results demonstrate that our CIFFormer model surpasses 17 benchmark models and achieves state-of-the-art performance on five popular datasets. Furthermore, our model exhibits remarkable performance on two video datasets as well. The source code of this work is available at <span><span>https://github.com/lonlin404/CIFFormer</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":19268,"journal":{"name":"Neurocomputing","volume":"644 ","pages":"Article 130413"},"PeriodicalIF":5.5000,"publicationDate":"2025-05-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"CIFFormer: A Contextual Information Flow Guided Transformer for colorectal polyp segmentation\",\"authors\":\"Cunlu Xu , Long Lin , Bin Wang , Jun Liu\",\"doi\":\"10.1016/j.neucom.2025.130413\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>Automatic segmentation of polyps in endoscopic images plays a critical role in the early diagnosis of colorectal cancer. In recent years, Visual Transformers, especially pyramid vision transformers, have achieved remarkable strides and become dominating methods in polyp segmentation. However, due to the high resemblance between polyps and normal tissues in terms of size, appearance, color, and other aspects, the pyramid vision transformer methods still face the challenges of the representation of fine-grained details and identifying highly disguised polyps that could be pivotal in precise segmentation of colorectal polyp. To address these challenges, we propose a novel Contextual Information Flow Guided Transformer (CIFFormer) for colorectal polyp segmentation to reconstruct the architecture of a pyramid vision transformer via a contextual information flow design. Our proposed method utilizes a pyramid-structured encoder to obtain multi-resolution feature maps. To effectively capture the target object’s features at various levels of detail, from coarse-grained global information to fine-grained local information, we design a Global-Local Feature Synergy Fusion module (GLFS). GLFS adopts a progressive fusion strategy, first fusing the features of adjacent hierarchy and then gradually fusing across the hierarchy. This allows the model to utilize the features of different semantic levels better and avoid the information loss caused by direct fusion. In addition, we also introduce a Multi-Density Global-Local Residual Module (MDGL). The multi-density residual units within MDGL improve feature propagation and information flow. By employing both local and global residual learning, the model gains a better ability to capture detailed information at both global and local scales. The experimental results demonstrate that our CIFFormer model surpasses 17 benchmark models and achieves state-of-the-art performance on five popular datasets. Furthermore, our model exhibits remarkable performance on two video datasets as well. The source code of this work is available at <span><span>https://github.com/lonlin404/CIFFormer</span><svg><path></path></svg></span>.</div></div>\",\"PeriodicalId\":19268,\"journal\":{\"name\":\"Neurocomputing\",\"volume\":\"644 \",\"pages\":\"Article 130413\"},\"PeriodicalIF\":5.5000,\"publicationDate\":\"2025-05-15\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Neurocomputing\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S0925231225010859\",\"RegionNum\":2,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Neurocomputing","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0925231225010859","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}

引用次数: 0

摘要

内镜图像中息肉的自动分割对结直肠癌的早期诊断具有至关重要的作用。近年来，视觉变换尤其是金字塔视觉变换在息肉分割领域取得了长足的进步，并成为息肉分割的主导方法。然而，由于息肉在大小、外观、颜色等方面与正常组织高度相似，金字塔视觉变换方法仍然面临着细粒度细节的表征和识别高度伪装的息肉的挑战，这对于直肠息肉的精确分割至关重要。为了解决这些挑战，我们提出了一种新的上下文信息流引导转换器（CIFFormer），用于结肠直肠息肉分割，通过上下文信息流设计重建金字塔视觉转换器的架构。我们提出的方法利用金字塔结构的编码器来获得多分辨率的特征图。为了有效地捕获目标对象从粗粒度全局信息到细粒度局部信息的各种细节特征，我们设计了一个全局-局部特征协同融合模块（GLFS）。GLFS采用渐进式融合策略，首先融合相邻层次的特征，然后逐步跨层次融合。这使得模型可以更好地利用不同语义层次的特征，避免了直接融合造成的信息丢失。此外，我们还介绍了一种多密度全局-局部残差模块（MDGL）。MDGL中的多密度残差单元改善了特征传播和信息流。通过采用局部和全局残差学习，该模型在全局和局部尺度上都获得了更好的捕获详细信息的能力。实验结果表明，我们的CIFFormer模型超过了17个基准模型，并在5个流行的数据集上达到了最先进的性能。此外，我们的模型在两个视频数据集上也表现出了出色的性能。该工作的源代码可从https://github.com/lonlin404/CIFFormer获得。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

CIFFormer: A Contextual Information Flow Guided Transformer for colorectal polyp segmentation

Automatic segmentation of polyps in endoscopic images plays a critical role in the early diagnosis of colorectal cancer. In recent years, Visual Transformers, especially pyramid vision transformers, have achieved remarkable strides and become dominating methods in polyp segmentation. However, due to the high resemblance between polyps and normal tissues in terms of size, appearance, color, and other aspects, the pyramid vision transformer methods still face the challenges of the representation of fine-grained details and identifying highly disguised polyps that could be pivotal in precise segmentation of colorectal polyp. To address these challenges, we propose a novel Contextual Information Flow Guided Transformer (CIFFormer) for colorectal polyp segmentation to reconstruct the architecture of a pyramid vision transformer via a contextual information flow design. Our proposed method utilizes a pyramid-structured encoder to obtain multi-resolution feature maps. To effectively capture the target object’s features at various levels of detail, from coarse-grained global information to fine-grained local information, we design a Global-Local Feature Synergy Fusion module (GLFS). GLFS adopts a progressive fusion strategy, first fusing the features of adjacent hierarchy and then gradually fusing across the hierarchy. This allows the model to utilize the features of different semantic levels better and avoid the information loss caused by direct fusion. In addition, we also introduce a Multi-Density Global-Local Residual Module (MDGL). The multi-density residual units within MDGL improve feature propagation and information flow. By employing both local and global residual learning, the model gains a better ability to capture detailed information at both global and local scales. The experimental results demonstrate that our CIFFormer model surpasses 17 benchmark models and achieves state-of-the-art performance on five popular datasets. Furthermore, our model exhibits remarkable performance on two video datasets as well. The source code of this work is available at https://github.com/lonlin404/CIFFormer.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Neurocomputing 工程技术-计算机：人工智能

CiteScore

13.10

自引率

10.00%

发文量

1382

审稿时长

70 days

期刊介绍： Neurocomputing publishes articles describing recent fundamental contributions in the field of neurocomputing. Neurocomputing theory, practice and applications are the essential topics being covered.