{"title":"TMPSformer:用于息肉分割的高效混合变压器-MLP 网络","authors":"Ping Guo, Guoping Liu, Huan Liu","doi":"10.1007/s11036-024-02411-y","DOIUrl":null,"url":null,"abstract":"<p>Colorectal cancer poses a global health risk, often heralded by colorectal polyps. Colonoscopy is the primary modality for polyp detection, with precise, real-time segmentation being key to effective diagnosis and surgical planning. Existing segmentation models like convolutional neural networks (CNNs) and Transformers have propelled progress but face trade-offs between precision and speed. CNNs excel in local feature extraction yet struggle with global context, while Transformers handle global information well but at a computational cost. Addressing these constraints, we introduce TMPSformer, a groundbreaking lightweight model tailored for efficient and accurate real-time polyp segmentation. TMPSformer, with its compact size of only 2.7 M, features a pioneering hybrid encoder merging Transformers’ long-range dependencies and shift Multi-Layer Perceptrons (MLPs)’ local dependencies, effectively enhancing segmentation performance. It also equips an All-MLP decoder to streamline feature fusion and enhance decoding efficiency. TMPSformer utilizes the Flash Efficient Attention (FEA) module to replace the traditional Attention module, significantly improving real-time performance. A comprehensive evaluation on five public polyp segmentation datasets demonstrated TMPSformer’s superiority over existing state-of-the-art algorithms. Specifically, TMPSformer achieves real-time processing at 162 frames per second (FPS) at 512 × 512 resolution on the Kvasir-SEG dataset using a single NVIDIA RTX 2080 Ti GPU, and achieves a mean Intersection over Union (mIoU) of 0.811. Its segmentation performance surpasses ColonSegNet by 8.7% and SegFormer by 4.8%. Additionally, TMPSformer significantly reduces complexity, cutting the parameter count by 1.8× and 31× compared to ColonSegNet and SegFormer, respectively.</p>","PeriodicalId":501103,"journal":{"name":"Mobile Networks and Applications","volume":"56 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-09-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"TMPSformer: An Efficient Hybrid Transformer-MLP Network for Polyp Segmentation\",\"authors\":\"Ping Guo, Guoping Liu, Huan Liu\",\"doi\":\"10.1007/s11036-024-02411-y\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p>Colorectal cancer poses a global health risk, often heralded by colorectal polyps. Colonoscopy is the primary modality for polyp detection, with precise, real-time segmentation being key to effective diagnosis and surgical planning. Existing segmentation models like convolutional neural networks (CNNs) and Transformers have propelled progress but face trade-offs between precision and speed. CNNs excel in local feature extraction yet struggle with global context, while Transformers handle global information well but at a computational cost. Addressing these constraints, we introduce TMPSformer, a groundbreaking lightweight model tailored for efficient and accurate real-time polyp segmentation. TMPSformer, with its compact size of only 2.7 M, features a pioneering hybrid encoder merging Transformers’ long-range dependencies and shift Multi-Layer Perceptrons (MLPs)’ local dependencies, effectively enhancing segmentation performance. It also equips an All-MLP decoder to streamline feature fusion and enhance decoding efficiency. TMPSformer utilizes the Flash Efficient Attention (FEA) module to replace the traditional Attention module, significantly improving real-time performance. A comprehensive evaluation on five public polyp segmentation datasets demonstrated TMPSformer’s superiority over existing state-of-the-art algorithms. Specifically, TMPSformer achieves real-time processing at 162 frames per second (FPS) at 512 × 512 resolution on the Kvasir-SEG dataset using a single NVIDIA RTX 2080 Ti GPU, and achieves a mean Intersection over Union (mIoU) of 0.811. Its segmentation performance surpasses ColonSegNet by 8.7% and SegFormer by 4.8%. Additionally, TMPSformer significantly reduces complexity, cutting the parameter count by 1.8× and 31× compared to ColonSegNet and SegFormer, respectively.</p>\",\"PeriodicalId\":501103,\"journal\":{\"name\":\"Mobile Networks and Applications\",\"volume\":\"56 1\",\"pages\":\"\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2024-09-10\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Mobile Networks and Applications\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1007/s11036-024-02411-y\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Mobile Networks and Applications","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1007/s11036-024-02411-y","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
TMPSformer: An Efficient Hybrid Transformer-MLP Network for Polyp Segmentation
Colorectal cancer poses a global health risk, often heralded by colorectal polyps. Colonoscopy is the primary modality for polyp detection, with precise, real-time segmentation being key to effective diagnosis and surgical planning. Existing segmentation models like convolutional neural networks (CNNs) and Transformers have propelled progress but face trade-offs between precision and speed. CNNs excel in local feature extraction yet struggle with global context, while Transformers handle global information well but at a computational cost. Addressing these constraints, we introduce TMPSformer, a groundbreaking lightweight model tailored for efficient and accurate real-time polyp segmentation. TMPSformer, with its compact size of only 2.7 M, features a pioneering hybrid encoder merging Transformers’ long-range dependencies and shift Multi-Layer Perceptrons (MLPs)’ local dependencies, effectively enhancing segmentation performance. It also equips an All-MLP decoder to streamline feature fusion and enhance decoding efficiency. TMPSformer utilizes the Flash Efficient Attention (FEA) module to replace the traditional Attention module, significantly improving real-time performance. A comprehensive evaluation on five public polyp segmentation datasets demonstrated TMPSformer’s superiority over existing state-of-the-art algorithms. Specifically, TMPSformer achieves real-time processing at 162 frames per second (FPS) at 512 × 512 resolution on the Kvasir-SEG dataset using a single NVIDIA RTX 2080 Ti GPU, and achieves a mean Intersection over Union (mIoU) of 0.811. Its segmentation performance surpasses ColonSegNet by 8.7% and SegFormer by 4.8%. Additionally, TMPSformer significantly reduces complexity, cutting the parameter count by 1.8× and 31× compared to ColonSegNet and SegFormer, respectively.