HAFormer：为轻量级语义分割释放层次意识特征的力量

IEEE transactions on image processing : a publication of the IEEE Signal Processing Society Pub Date : 2024-07-15 DOI:10.1109/TIP.2024.3425048

Guoan Xu;Wenjing Jia;Tao Wu;Ligeng Chen;Guangwei Gao

{"title":"HAFormer：为轻量级语义分割释放层次意识特征的力量","authors":"Guoan Xu;Wenjing Jia;Tao Wu;Ligeng Chen;Guangwei Gao","doi":"10.1109/TIP.2024.3425048","DOIUrl":null,"url":null,"abstract":"Both Convolutional Neural Networks (CNNs) and Transformers have shown great success in semantic segmentation tasks. Efforts have been made to integrate CNNs with Transformer models to capture both local and global context interactions. However, there is still room for enhancement, particularly when considering constraints on computational resources. In this paper, we introduce HAFormer, a model that combines the hierarchical features extraction ability of CNNs with the global dependency modeling capability of Transformers to tackle lightweight semantic segmentation challenges. Specifically, we design a Hierarchy-Aware Pixel-Excitation (HAPE) module for adaptive multi-scale local feature extraction. During the global perception modeling, we devise an Efficient Transformer (ET) module streamlining the quadratic calculations associated with traditional Transformers. Moreover, a correlation-weighted Fusion (cwF) module selectively merges diverse feature representations, significantly enhancing predictive accuracy. HAFormer achieves high performance with minimal computational overhead and compact model size, achieving 74.2% mIoU on Cityscapes and 71.1% mIoU on CamVid test datasets, with frame rates of 105FPS and 118FPS on a single 2080Ti GPU. The source codes are available at \n<uri>https://github.com/XU-GITHUB-curry/HAFormer</uri>\n.","PeriodicalId":94032,"journal":{"name":"IEEE transactions on image processing : a publication of the IEEE Signal Processing Society","volume":"33 ","pages":"4202-4214"},"PeriodicalIF":0.0000,"publicationDate":"2024-07-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"HAFormer: Unleashing the Power of Hierarchy-Aware Features for Lightweight Semantic Segmentation\",\"authors\":\"Guoan Xu;Wenjing Jia;Tao Wu;Ligeng Chen;Guangwei Gao\",\"doi\":\"10.1109/TIP.2024.3425048\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Both Convolutional Neural Networks (CNNs) and Transformers have shown great success in semantic segmentation tasks. Efforts have been made to integrate CNNs with Transformer models to capture both local and global context interactions. However, there is still room for enhancement, particularly when considering constraints on computational resources. In this paper, we introduce HAFormer, a model that combines the hierarchical features extraction ability of CNNs with the global dependency modeling capability of Transformers to tackle lightweight semantic segmentation challenges. Specifically, we design a Hierarchy-Aware Pixel-Excitation (HAPE) module for adaptive multi-scale local feature extraction. During the global perception modeling, we devise an Efficient Transformer (ET) module streamlining the quadratic calculations associated with traditional Transformers. Moreover, a correlation-weighted Fusion (cwF) module selectively merges diverse feature representations, significantly enhancing predictive accuracy. HAFormer achieves high performance with minimal computational overhead and compact model size, achieving 74.2% mIoU on Cityscapes and 71.1% mIoU on CamVid test datasets, with frame rates of 105FPS and 118FPS on a single 2080Ti GPU. The source codes are available at \\n<uri>https://github.com/XU-GITHUB-curry/HAFormer</uri>\\n.\",\"PeriodicalId\":94032,\"journal\":{\"name\":\"IEEE transactions on image processing : a publication of the IEEE Signal Processing Society\",\"volume\":\"33 \",\"pages\":\"4202-4214\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2024-07-15\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"IEEE transactions on image processing : a publication of the IEEE Signal Processing Society\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://ieeexplore.ieee.org/document/10599205/\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE transactions on image processing : a publication of the IEEE Signal Processing Society","FirstCategoryId":"1085","ListUrlMain":"https://ieeexplore.ieee.org/document/10599205/","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

摘要

卷积神经网络（CNN）和变换器在语义分割任务中都取得了巨大成功。人们一直在努力将 CNN 与 Transformer 模型相结合，以捕捉局部和全局上下文的交互。但是，仍有改进的余地，尤其是在考虑到计算资源的限制时。在本文中，我们介绍了 HAFormer，这是一种将 CNN 的分层特征提取能力与 Transformer 的全局依赖关系建模能力相结合的模型，用于应对轻量级语义分割挑战。具体来说，我们设计了一个分层感知像素激发（HAPE）模块，用于自适应多尺度局部特征提取。在全局感知建模过程中，我们设计了一个高效变换器（ET）模块，简化了与传统变换器相关的二次计算。此外，相关加权融合（cwF）模块可选择性地合并不同的特征表征，从而显著提高预测准确性。HAFormer 以最小的计算开销和紧凑的模型尺寸实现了高性能，在 Cityscapes 和 CamVid 测试数据集上分别实现了 74.2% 和 71.1% 的 mIoU，单个 2080Ti GPU 的帧速率分别为 105FPS 和 118FPS。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

HAFormer: Unleashing the Power of Hierarchy-Aware Features for Lightweight Semantic Segmentation

Both Convolutional Neural Networks (CNNs) and Transformers have shown great success in semantic segmentation tasks. Efforts have been made to integrate CNNs with Transformer models to capture both local and global context interactions. However, there is still room for enhancement, particularly when considering constraints on computational resources. In this paper, we introduce HAFormer, a model that combines the hierarchical features extraction ability of CNNs with the global dependency modeling capability of Transformers to tackle lightweight semantic segmentation challenges. Specifically, we design a Hierarchy-Aware Pixel-Excitation (HAPE) module for adaptive multi-scale local feature extraction. During the global perception modeling, we devise an Efficient Transformer (ET) module streamlining the quadratic calculations associated with traditional Transformers. Moreover, a correlation-weighted Fusion (cwF) module selectively merges diverse feature representations, significantly enhancing predictive accuracy. HAFormer achieves high performance with minimal computational overhead and compact model size, achieving 74.2% mIoU on Cityscapes and 71.1% mIoU on CamVid test datasets, with frame rates of 105FPS and 118FPS on a single 2080Ti GPU. The source codes are available at https://github.com/XU-GITHUB-curry/HAFormer .

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

IEEE transactions on image processing : a publication of the IEEE Signal Processing Society

自引率

0.00%

发文量