Xinyu Li , Qiaohong Liu , Xuewei Li , Tiansheng Huang , Min Lin , Xiaoxiang Han , Weikun Zhang , Keyan Chen , Yuanjie Lin
{"title":"CIFTC-Net: Cross information fusion network with transformer and CNN for polyp segmentation","authors":"Xinyu Li , Qiaohong Liu , Xuewei Li , Tiansheng Huang , Min Lin , Xiaoxiang Han , Weikun Zhang , Keyan Chen , Yuanjie Lin","doi":"10.1016/j.displa.2024.102872","DOIUrl":null,"url":null,"abstract":"<div><div>Polyp segmentation plays a crucial role in the early diagnosis and treatment of colorectal cancer, which is the third most common cancer worldwide. Despite remarkable successes achieved by recent deep learning-related works, accurate segmentation of polyps remains challenging due to the diversity in their shapes, sizes, appearances, and other factors. To address these problems, a novel cross information fusion network with Transformer and convolutional neural network (CNN) for polyp segmentation, named CIFTC-Net, is proposed to improve the segmentation performance of colon polyps. In particular, a dual-branch encoder with Pyramid Vision Transformer (PVT) and ResNet50 is employed to take full advantage of both the global semantic information and local spatial features to enhance the feature representation ability. To effectively fuse the two types of features, a new global–local feature fusion (GLFF) module is designed. Additionally, in the PVT branch, a multi-scale feature integration (MSFI) module is introduced to fuse multi-scale features adaptively. At the bottom of the model, a multi-scale atrous pyramid bridging (MSAPB) module is proposed to achieve rich and robust multi-level features and improve the segmentation accuracy. Experimental results on four public polyp segmentation datasets demonstrate that CIFTC-Net surpasses current state-of-the-art methods across various metrics, showcasing its superiority in segmentation accuracy, generalization ability, and handling of complex images.</div></div>","PeriodicalId":50570,"journal":{"name":"Displays","volume":null,"pages":null},"PeriodicalIF":3.7000,"publicationDate":"2024-11-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Displays","FirstCategoryId":"5","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0141938224002361","RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, HARDWARE & ARCHITECTURE","Score":null,"Total":0}
引用次数: 0
Abstract
Polyp segmentation plays a crucial role in the early diagnosis and treatment of colorectal cancer, which is the third most common cancer worldwide. Despite remarkable successes achieved by recent deep learning-related works, accurate segmentation of polyps remains challenging due to the diversity in their shapes, sizes, appearances, and other factors. To address these problems, a novel cross information fusion network with Transformer and convolutional neural network (CNN) for polyp segmentation, named CIFTC-Net, is proposed to improve the segmentation performance of colon polyps. In particular, a dual-branch encoder with Pyramid Vision Transformer (PVT) and ResNet50 is employed to take full advantage of both the global semantic information and local spatial features to enhance the feature representation ability. To effectively fuse the two types of features, a new global–local feature fusion (GLFF) module is designed. Additionally, in the PVT branch, a multi-scale feature integration (MSFI) module is introduced to fuse multi-scale features adaptively. At the bottom of the model, a multi-scale atrous pyramid bridging (MSAPB) module is proposed to achieve rich and robust multi-level features and improve the segmentation accuracy. Experimental results on four public polyp segmentation datasets demonstrate that CIFTC-Net surpasses current state-of-the-art methods across various metrics, showcasing its superiority in segmentation accuracy, generalization ability, and handling of complex images.
期刊介绍:
Displays is the international journal covering the research and development of display technology, its effective presentation and perception of information, and applications and systems including display-human interface.
Technical papers on practical developments in Displays technology provide an effective channel to promote greater understanding and cross-fertilization across the diverse disciplines of the Displays community. Original research papers solving ergonomics issues at the display-human interface advance effective presentation of information. Tutorial papers covering fundamentals intended for display technologies and human factor engineers new to the field will also occasionally featured.