{"title":"TMPSformer: An Efficient Hybrid Transformer-MLP Network for Polyp Segmentation","authors":"Ping Guo, Guoping Liu, Huan Liu","doi":"10.1007/s11036-024-02411-y","DOIUrl":null,"url":null,"abstract":"<p>Colorectal cancer poses a global health risk, often heralded by colorectal polyps. Colonoscopy is the primary modality for polyp detection, with precise, real-time segmentation being key to effective diagnosis and surgical planning. Existing segmentation models like convolutional neural networks (CNNs) and Transformers have propelled progress but face trade-offs between precision and speed. CNNs excel in local feature extraction yet struggle with global context, while Transformers handle global information well but at a computational cost. Addressing these constraints, we introduce TMPSformer, a groundbreaking lightweight model tailored for efficient and accurate real-time polyp segmentation. TMPSformer, with its compact size of only 2.7 M, features a pioneering hybrid encoder merging Transformers’ long-range dependencies and shift Multi-Layer Perceptrons (MLPs)’ local dependencies, effectively enhancing segmentation performance. It also equips an All-MLP decoder to streamline feature fusion and enhance decoding efficiency. TMPSformer utilizes the Flash Efficient Attention (FEA) module to replace the traditional Attention module, significantly improving real-time performance. A comprehensive evaluation on five public polyp segmentation datasets demonstrated TMPSformer’s superiority over existing state-of-the-art algorithms. Specifically, TMPSformer achieves real-time processing at 162 frames per second (FPS) at 512 × 512 resolution on the Kvasir-SEG dataset using a single NVIDIA RTX 2080 Ti GPU, and achieves a mean Intersection over Union (mIoU) of 0.811. Its segmentation performance surpasses ColonSegNet by 8.7% and SegFormer by 4.8%. Additionally, TMPSformer significantly reduces complexity, cutting the parameter count by 1.8× and 31× compared to ColonSegNet and SegFormer, respectively.</p>","PeriodicalId":501103,"journal":{"name":"Mobile Networks and Applications","volume":"56 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-09-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Mobile Networks and Applications","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1007/s11036-024-02411-y","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Colorectal cancer poses a global health risk, often heralded by colorectal polyps. Colonoscopy is the primary modality for polyp detection, with precise, real-time segmentation being key to effective diagnosis and surgical planning. Existing segmentation models like convolutional neural networks (CNNs) and Transformers have propelled progress but face trade-offs between precision and speed. CNNs excel in local feature extraction yet struggle with global context, while Transformers handle global information well but at a computational cost. Addressing these constraints, we introduce TMPSformer, a groundbreaking lightweight model tailored for efficient and accurate real-time polyp segmentation. TMPSformer, with its compact size of only 2.7 M, features a pioneering hybrid encoder merging Transformers’ long-range dependencies and shift Multi-Layer Perceptrons (MLPs)’ local dependencies, effectively enhancing segmentation performance. It also equips an All-MLP decoder to streamline feature fusion and enhance decoding efficiency. TMPSformer utilizes the Flash Efficient Attention (FEA) module to replace the traditional Attention module, significantly improving real-time performance. A comprehensive evaluation on five public polyp segmentation datasets demonstrated TMPSformer’s superiority over existing state-of-the-art algorithms. Specifically, TMPSformer achieves real-time processing at 162 frames per second (FPS) at 512 × 512 resolution on the Kvasir-SEG dataset using a single NVIDIA RTX 2080 Ti GPU, and achieves a mean Intersection over Union (mIoU) of 0.811. Its segmentation performance surpasses ColonSegNet by 8.7% and SegFormer by 4.8%. Additionally, TMPSformer significantly reduces complexity, cutting the parameter count by 1.8× and 31× compared to ColonSegNet and SegFormer, respectively.