TMPSformer: An Efficient Hybrid Transformer-MLP Network for Polyp Segmentation

Ping Guo, Guoping Liu, Huan Liu
{"title":"TMPSformer: An Efficient Hybrid Transformer-MLP Network for Polyp Segmentation","authors":"Ping Guo, Guoping Liu, Huan Liu","doi":"10.1007/s11036-024-02411-y","DOIUrl":null,"url":null,"abstract":"<p>Colorectal cancer poses a global health risk, often heralded by colorectal polyps. Colonoscopy is the primary modality for polyp detection, with precise, real-time segmentation being key to effective diagnosis and surgical planning. Existing segmentation models like convolutional neural networks (CNNs) and Transformers have propelled progress but face trade-offs between precision and speed. CNNs excel in local feature extraction yet struggle with global context, while Transformers handle global information well but at a computational cost. Addressing these constraints, we introduce TMPSformer, a groundbreaking lightweight model tailored for efficient and accurate real-time polyp segmentation. TMPSformer, with its compact size of only 2.7 M, features a pioneering hybrid encoder merging Transformers’ long-range dependencies and shift Multi-Layer Perceptrons (MLPs)’ local dependencies, effectively enhancing segmentation performance. It also equips an All-MLP decoder to streamline feature fusion and enhance decoding efficiency. TMPSformer utilizes the Flash Efficient Attention (FEA) module to replace the traditional Attention module, significantly improving real-time performance. A comprehensive evaluation on five public polyp segmentation datasets demonstrated TMPSformer’s superiority over existing state-of-the-art algorithms. Specifically, TMPSformer achieves real-time processing at 162 frames per second (FPS) at 512 × 512 resolution on the Kvasir-SEG dataset using a single NVIDIA RTX 2080 Ti GPU, and achieves a mean Intersection over Union (mIoU) of 0.811. Its segmentation performance surpasses ColonSegNet by 8.7% and SegFormer by 4.8%. Additionally, TMPSformer significantly reduces complexity, cutting the parameter count by 1.8× and 31× compared to ColonSegNet and SegFormer, respectively.</p>","PeriodicalId":501103,"journal":{"name":"Mobile Networks and Applications","volume":"56 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-09-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Mobile Networks and Applications","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1007/s11036-024-02411-y","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

Colorectal cancer poses a global health risk, often heralded by colorectal polyps. Colonoscopy is the primary modality for polyp detection, with precise, real-time segmentation being key to effective diagnosis and surgical planning. Existing segmentation models like convolutional neural networks (CNNs) and Transformers have propelled progress but face trade-offs between precision and speed. CNNs excel in local feature extraction yet struggle with global context, while Transformers handle global information well but at a computational cost. Addressing these constraints, we introduce TMPSformer, a groundbreaking lightweight model tailored for efficient and accurate real-time polyp segmentation. TMPSformer, with its compact size of only 2.7 M, features a pioneering hybrid encoder merging Transformers’ long-range dependencies and shift Multi-Layer Perceptrons (MLPs)’ local dependencies, effectively enhancing segmentation performance. It also equips an All-MLP decoder to streamline feature fusion and enhance decoding efficiency. TMPSformer utilizes the Flash Efficient Attention (FEA) module to replace the traditional Attention module, significantly improving real-time performance. A comprehensive evaluation on five public polyp segmentation datasets demonstrated TMPSformer’s superiority over existing state-of-the-art algorithms. Specifically, TMPSformer achieves real-time processing at 162 frames per second (FPS) at 512 × 512 resolution on the Kvasir-SEG dataset using a single NVIDIA RTX 2080 Ti GPU, and achieves a mean Intersection over Union (mIoU) of 0.811. Its segmentation performance surpasses ColonSegNet by 8.7% and SegFormer by 4.8%. Additionally, TMPSformer significantly reduces complexity, cutting the parameter count by 1.8× and 31× compared to ColonSegNet and SegFormer, respectively.

Abstract Image

TMPSformer:用于息肉分割的高效混合变压器-MLP 网络
大肠癌对全球健康构成威胁,而大肠息肉往往是大肠癌的先兆。结肠镜检查是检测息肉的主要方式,精确、实时的分割是有效诊断和手术规划的关键。卷积神经网络(CNN)和变形器等现有的分割模型推动了这一技术的进步,但也面临着精度和速度之间的权衡。卷积神经网络(CNN)擅长局部特征提取,但在处理全局上下文时却举步维艰,而变换器虽然能很好地处理全局信息,但却需要付出计算成本。为了解决这些制约因素,我们推出了 TMPSformer,这是一种开创性的轻量级模型,专为高效、准确的实时息肉分割而量身定制。TMPSformer 体积小巧,仅有 2.7 M,采用开创性的混合编码器,融合了 Transformers 的长程依赖性和移位多层感知器(MLP)的局部依赖性,有效提高了分割性能。它还配备了一个 All-MLP 解码器,以简化特征融合并提高解码效率。TMPSformer 利用闪存高效注意力(FEA)模块取代了传统的注意力模块,显著提高了实时性能。对五个公共息肉分割数据集的综合评估表明,TMPSformer 优于现有的先进算法。具体来说,在 Kvasir-SEG 数据集上,TMPSformer 使用单个英伟达 RTX 2080 Ti GPU,在 512 × 512 分辨率下实现了每秒 162 帧(FPS)的实时处理速度,平均交集大于联合(mIoU)达到 0.811。其分割性能比 ColonSegNet 高出 8.7%,比 SegFormer 高出 4.8%。此外,TMPSformer 还大大降低了复杂性,与 ColonSegNet 和 SegFormer 相比,参数数量分别减少了 1.8 倍和 31 倍。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信