耦合器:重新思考视觉变压器与耦合注意

2023 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV) Pub Date : 2023-01-01 DOI:10.1109/WACV56688.2023.00641

Hai Lan, Xihao Wang, Hao Shen, Peidong Liang, Xian Wei

{"title":"耦合器:重新思考视觉变压器与耦合注意","authors":"Hai Lan, Xihao Wang, Hao Shen, Peidong Liang, Xian Wei","doi":"10.1109/WACV56688.2023.00641","DOIUrl":null,"url":null,"abstract":"With the development of the self-attention mechanism, the Transformer model has demonstrated its outstanding performance in the computer vision domain. However, the massive computation brought from the full attention mechanism became a heavy burden for memory consumption. Sequentially, the limitation of memory consumption hinders the deployment of the Transformer model on the embedded system where the computing resources are limited. To remedy this problem, we propose a novel memory economy attention mechanism named Couplformer, which decouples the attention map into two sub-matrices and generates the alignment scores from spatial information. Our method enables the Transformer model to improve time and memory efficiency while maintaining expressive power. A series of different scale image classification tasks are applied to evaluate the effectiveness of our model. The result of experiments shows that on the ImageNet-1K classification task, the Couplformer can significantly decrease 42% memory consumption compared with the regular Transformer. Meanwhile, it accesses sufficient accuracy requirements, which outperforms 0.56% on Top-1 accuracy and occupies the same memory footprint. Besides, the Couplformer achieves state-of-art performance in MS COCO 2017 object detection and instance segmentation tasks. As a result, the Couplformer can serve as an efficient backbone in visual tasks and provide a novel perspective on deploying attention mechanisms for researchers.","PeriodicalId":270631,"journal":{"name":"2023 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV)","volume":"2 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":"{\"title\":\"Couplformer: Rethinking Vision Transformer with Coupling Attention\",\"authors\":\"Hai Lan, Xihao Wang, Hao Shen, Peidong Liang, Xian Wei\",\"doi\":\"10.1109/WACV56688.2023.00641\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"With the development of the self-attention mechanism, the Transformer model has demonstrated its outstanding performance in the computer vision domain. However, the massive computation brought from the full attention mechanism became a heavy burden for memory consumption. Sequentially, the limitation of memory consumption hinders the deployment of the Transformer model on the embedded system where the computing resources are limited. To remedy this problem, we propose a novel memory economy attention mechanism named Couplformer, which decouples the attention map into two sub-matrices and generates the alignment scores from spatial information. Our method enables the Transformer model to improve time and memory efficiency while maintaining expressive power. A series of different scale image classification tasks are applied to evaluate the effectiveness of our model. The result of experiments shows that on the ImageNet-1K classification task, the Couplformer can significantly decrease 42% memory consumption compared with the regular Transformer. Meanwhile, it accesses sufficient accuracy requirements, which outperforms 0.56% on Top-1 accuracy and occupies the same memory footprint. Besides, the Couplformer achieves state-of-art performance in MS COCO 2017 object detection and instance segmentation tasks. As a result, the Couplformer can serve as an efficient backbone in visual tasks and provide a novel perspective on deploying attention mechanisms for researchers.\",\"PeriodicalId\":270631,\"journal\":{\"name\":\"2023 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV)\",\"volume\":\"2 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2023-01-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"3\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2023 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/WACV56688.2023.00641\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2023 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/WACV56688.2023.00641","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 3

摘要

随着自注意机制的发展，Transformer模型在计算机视觉领域表现出了优异的性能。然而，全注意机制带来的大量计算成为内存消耗的沉重负担。接着，内存消耗的限制阻碍了在计算资源有限的嵌入式系统上部署Transformer模型。为了解决这一问题，我们提出了一种新的记忆经济注意机制——耦合器(Couplformer)，该机制将注意图解耦为两个子矩阵，并根据空间信息生成对齐分数。我们的方法使Transformer模型能够在保持表达能力的同时提高时间和内存效率。应用一系列不同尺度的图像分类任务来评估我们的模型的有效性。实验结果表明，在ImageNet-1K分类任务上，与常规Transformer相比，Couplformer可以显著降低42%的内存消耗。同时，它访问了足够的精度要求，优于0.56%的Top-1精度，占用相同的内存占用。此外，Couplformer在MS COCO 2017目标检测和实例分割任务中达到了最先进的性能。因此，耦合器可以作为视觉任务的有效骨干，为研究人员提供了一个部署注意机制的新视角。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Couplformer: Rethinking Vision Transformer with Coupling Attention

With the development of the self-attention mechanism, the Transformer model has demonstrated its outstanding performance in the computer vision domain. However, the massive computation brought from the full attention mechanism became a heavy burden for memory consumption. Sequentially, the limitation of memory consumption hinders the deployment of the Transformer model on the embedded system where the computing resources are limited. To remedy this problem, we propose a novel memory economy attention mechanism named Couplformer, which decouples the attention map into two sub-matrices and generates the alignment scores from spatial information. Our method enables the Transformer model to improve time and memory efficiency while maintaining expressive power. A series of different scale image classification tasks are applied to evaluate the effectiveness of our model. The result of experiments shows that on the ImageNet-1K classification task, the Couplformer can significantly decrease 42% memory consumption compared with the regular Transformer. Meanwhile, it accesses sufficient accuracy requirements, which outperforms 0.56% on Top-1 accuracy and occupies the same memory footprint. Besides, the Couplformer achieves state-of-art performance in MS COCO 2017 object detection and instance segmentation tasks. As a result, the Couplformer can serve as an efficient backbone in visual tasks and provide a novel perspective on deploying attention mechanisms for researchers.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2023 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV)

自引率

0.00%

发文量