{"title":"Agent Attention: On the Integration of Softmax and Linear Attention","authors":"Dongchen Han, Tianzhu Ye, Yizeng Han, Zhuofan Xia, Shiji Song, Gao Huang","doi":"arxiv-2312.08874","DOIUrl":null,"url":null,"abstract":"The attention module is the key component in Transformers. While the global\nattention mechanism offers high expressiveness, its excessive computational\ncost restricts its applicability in various scenarios. In this paper, we\npropose a novel attention paradigm, Agent Attention, to strike a favorable\nbalance between computational efficiency and representation power.\nSpecifically, the Agent Attention, denoted as a quadruple $(Q, A, K, V)$,\nintroduces an additional set of agent tokens $A$ into the conventional\nattention module. The agent tokens first act as the agent for the query tokens\n$Q$ to aggregate information from $K$ and $V$, and then broadcast the\ninformation back to $Q$. Given the number of agent tokens can be designed to be\nmuch smaller than the number of query tokens, the agent attention is\nsignificantly more efficient than the widely adopted Softmax attention, while\npreserving global context modelling capability. Interestingly, we show that the\nproposed agent attention is equivalent to a generalized form of linear\nattention. Therefore, agent attention seamlessly integrates the powerful\nSoftmax attention and the highly efficient linear attention. Extensive\nexperiments demonstrate the effectiveness of agent attention with various\nvision Transformers and across diverse vision tasks, including image\nclassification, object detection, semantic segmentation and image generation.\nNotably, agent attention has shown remarkable performance in high-resolution\nscenarios, owning to its linear attention nature. For instance, when applied to\nStable Diffusion, our agent attention accelerates generation and substantially\nenhances image generation quality without any additional training. Code is\navailable at https://github.com/LeapLabTHU/Agent-Attention.","PeriodicalId":501130,"journal":{"name":"arXiv - CS - Computer Vision and Pattern Recognition","volume":"21 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2023-12-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - CS - Computer Vision and Pattern Recognition","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2312.08874","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
The attention module is the key component in Transformers. While the global
attention mechanism offers high expressiveness, its excessive computational
cost restricts its applicability in various scenarios. In this paper, we
propose a novel attention paradigm, Agent Attention, to strike a favorable
balance between computational efficiency and representation power.
Specifically, the Agent Attention, denoted as a quadruple $(Q, A, K, V)$,
introduces an additional set of agent tokens $A$ into the conventional
attention module. The agent tokens first act as the agent for the query tokens
$Q$ to aggregate information from $K$ and $V$, and then broadcast the
information back to $Q$. Given the number of agent tokens can be designed to be
much smaller than the number of query tokens, the agent attention is
significantly more efficient than the widely adopted Softmax attention, while
preserving global context modelling capability. Interestingly, we show that the
proposed agent attention is equivalent to a generalized form of linear
attention. Therefore, agent attention seamlessly integrates the powerful
Softmax attention and the highly efficient linear attention. Extensive
experiments demonstrate the effectiveness of agent attention with various
vision Transformers and across diverse vision tasks, including image
classification, object detection, semantic segmentation and image generation.
Notably, agent attention has shown remarkable performance in high-resolution
scenarios, owning to its linear attention nature. For instance, when applied to
Stable Diffusion, our agent attention accelerates generation and substantially
enhances image generation quality without any additional training. Code is
available at https://github.com/LeapLabTHU/Agent-Attention.