TMPSformer：用于息肉分割的高效混合变压器-MLP 网络

Mobile Networks and Applications Pub Date : 2024-09-10 DOI:10.1007/s11036-024-02411-y

Ping Guo, Guoping Liu, Huan Liu

{"title":"TMPSformer：用于息肉分割的高效混合变压器-MLP 网络","authors":"Ping Guo, Guoping Liu, Huan Liu","doi":"10.1007/s11036-024-02411-y","DOIUrl":null,"url":null,"abstract":"<p>Colorectal cancer poses a global health risk, often heralded by colorectal polyps. Colonoscopy is the primary modality for polyp detection, with precise, real-time segmentation being key to effective diagnosis and surgical planning. Existing segmentation models like convolutional neural networks (CNNs) and Transformers have propelled progress but face trade-offs between precision and speed. CNNs excel in local feature extraction yet struggle with global context, while Transformers handle global information well but at a computational cost. Addressing these constraints, we introduce TMPSformer, a groundbreaking lightweight model tailored for efficient and accurate real-time polyp segmentation. TMPSformer, with its compact size of only 2.7 M, features a pioneering hybrid encoder merging Transformers’ long-range dependencies and shift Multi-Layer Perceptrons (MLPs)’ local dependencies, effectively enhancing segmentation performance. It also equips an All-MLP decoder to streamline feature fusion and enhance decoding efficiency. TMPSformer utilizes the Flash Efficient Attention (FEA) module to replace the traditional Attention module, significantly improving real-time performance. A comprehensive evaluation on five public polyp segmentation datasets demonstrated TMPSformer’s superiority over existing state-of-the-art algorithms. Specifically, TMPSformer achieves real-time processing at 162 frames per second (FPS) at 512 × 512 resolution on the Kvasir-SEG dataset using a single NVIDIA RTX 2080 Ti GPU, and achieves a mean Intersection over Union (mIoU) of 0.811. Its segmentation performance surpasses ColonSegNet by 8.7% and SegFormer by 4.8%. Additionally, TMPSformer significantly reduces complexity, cutting the parameter count by 1.8× and 31× compared to ColonSegNet and SegFormer, respectively.</p>","PeriodicalId":501103,"journal":{"name":"Mobile Networks and Applications","volume":"56 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-09-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"TMPSformer: An Efficient Hybrid Transformer-MLP Network for Polyp Segmentation\",\"authors\":\"Ping Guo, Guoping Liu, Huan Liu\",\"doi\":\"10.1007/s11036-024-02411-y\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p>Colorectal cancer poses a global health risk, often heralded by colorectal polyps. Colonoscopy is the primary modality for polyp detection, with precise, real-time segmentation being key to effective diagnosis and surgical planning. Existing segmentation models like convolutional neural networks (CNNs) and Transformers have propelled progress but face trade-offs between precision and speed. CNNs excel in local feature extraction yet struggle with global context, while Transformers handle global information well but at a computational cost. Addressing these constraints, we introduce TMPSformer, a groundbreaking lightweight model tailored for efficient and accurate real-time polyp segmentation. TMPSformer, with its compact size of only 2.7 M, features a pioneering hybrid encoder merging Transformers’ long-range dependencies and shift Multi-Layer Perceptrons (MLPs)’ local dependencies, effectively enhancing segmentation performance. It also equips an All-MLP decoder to streamline feature fusion and enhance decoding efficiency. TMPSformer utilizes the Flash Efficient Attention (FEA) module to replace the traditional Attention module, significantly improving real-time performance. A comprehensive evaluation on five public polyp segmentation datasets demonstrated TMPSformer’s superiority over existing state-of-the-art algorithms. Specifically, TMPSformer achieves real-time processing at 162 frames per second (FPS) at 512 × 512 resolution on the Kvasir-SEG dataset using a single NVIDIA RTX 2080 Ti GPU, and achieves a mean Intersection over Union (mIoU) of 0.811. Its segmentation performance surpasses ColonSegNet by 8.7% and SegFormer by 4.8%. Additionally, TMPSformer significantly reduces complexity, cutting the parameter count by 1.8× and 31× compared to ColonSegNet and SegFormer, respectively.</p>\",\"PeriodicalId\":501103,\"journal\":{\"name\":\"Mobile Networks and Applications\",\"volume\":\"56 1\",\"pages\":\"\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2024-09-10\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Mobile Networks and Applications\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1007/s11036-024-02411-y\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Mobile Networks and Applications","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1007/s11036-024-02411-y","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

摘要

大肠癌对全球健康构成威胁，而大肠息肉往往是大肠癌的先兆。结肠镜检查是检测息肉的主要方式，精确、实时的分割是有效诊断和手术规划的关键。卷积神经网络（CNN）和变形器等现有的分割模型推动了这一技术的进步，但也面临着精度和速度之间的权衡。卷积神经网络（CNN）擅长局部特征提取，但在处理全局上下文时却举步维艰，而变换器虽然能很好地处理全局信息，但却需要付出计算成本。为了解决这些制约因素，我们推出了 TMPSformer，这是一种开创性的轻量级模型，专为高效、准确的实时息肉分割而量身定制。TMPSformer 体积小巧，仅有 2.7 M，采用开创性的混合编码器，融合了 Transformers 的长程依赖性和移位多层感知器（MLP）的局部依赖性，有效提高了分割性能。它还配备了一个 All-MLP 解码器，以简化特征融合并提高解码效率。TMPSformer 利用闪存高效注意力（FEA）模块取代了传统的注意力模块，显著提高了实时性能。对五个公共息肉分割数据集的综合评估表明，TMPSformer 优于现有的先进算法。具体来说，在 Kvasir-SEG 数据集上，TMPSformer 使用单个英伟达 RTX 2080 Ti GPU，在 512 × 512 分辨率下实现了每秒 162 帧（FPS）的实时处理速度，平均交集大于联合（mIoU）达到 0.811。其分割性能比 ColonSegNet 高出 8.7%，比 SegFormer 高出 4.8%。此外，TMPSformer 还大大降低了复杂性，与 ColonSegNet 和 SegFormer 相比，参数数量分别减少了 1.8 倍和 31 倍。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

TMPSformer: An Efficient Hybrid Transformer-MLP Network for Polyp Segmentation

查看原文本刊更多论文

TMPSformer: An Efficient Hybrid Transformer-MLP Network for Polyp Segmentation

Colorectal cancer poses a global health risk, often heralded by colorectal polyps. Colonoscopy is the primary modality for polyp detection, with precise, real-time segmentation being key to effective diagnosis and surgical planning. Existing segmentation models like convolutional neural networks (CNNs) and Transformers have propelled progress but face trade-offs between precision and speed. CNNs excel in local feature extraction yet struggle with global context, while Transformers handle global information well but at a computational cost. Addressing these constraints, we introduce TMPSformer, a groundbreaking lightweight model tailored for efficient and accurate real-time polyp segmentation. TMPSformer, with its compact size of only 2.7 M, features a pioneering hybrid encoder merging Transformers’ long-range dependencies and shift Multi-Layer Perceptrons (MLPs)’ local dependencies, effectively enhancing segmentation performance. It also equips an All-MLP decoder to streamline feature fusion and enhance decoding efficiency. TMPSformer utilizes the Flash Efficient Attention (FEA) module to replace the traditional Attention module, significantly improving real-time performance. A comprehensive evaluation on five public polyp segmentation datasets demonstrated TMPSformer’s superiority over existing state-of-the-art algorithms. Specifically, TMPSformer achieves real-time processing at 162 frames per second (FPS) at 512 × 512 resolution on the Kvasir-SEG dataset using a single NVIDIA RTX 2080 Ti GPU, and achieves a mean Intersection over Union (mIoU) of 0.811. Its segmentation performance surpasses ColonSegNet by 8.7% and SegFormer by 4.8%. Additionally, TMPSformer significantly reduces complexity, cutting the parameter count by 1.8× and 31× compared to ColonSegNet and SegFormer, respectively.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Mobile Networks and Applications

自引率

0.00%

发文量