DBA: Efficient Transformer With Dynamic Bilinear Low-Rank Attention

IF 8.9 1区计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

IEEE transactions on neural networks and learning systems Pub Date : 2025-01-15 DOI:10.1109/TNNLS.2025.3527046

Bosheng Qin;Juncheng Li;Siliang Tang;Yueting Zhuang

{"title":"DBA: Efficient Transformer With Dynamic Bilinear Low-Rank Attention","authors":"Bosheng Qin;Juncheng Li;Siliang Tang;Yueting Zhuang","doi":"10.1109/TNNLS.2025.3527046","DOIUrl":null,"url":null,"abstract":"Many studies have aimed to improve Transformer model efficiency using low-rank-based methods that compress sequence length with predetermined or learned compression matrices. However, these methods fix compression coefficients for tokens in the same position during inference, ignoring sequence-specific variations. They also overlook the impact of hidden state dimensions on efficiency gains. To address these limitations, we propose dynamic bilinear low-rank attention (DBA), an efficient and effective attention mechanism that compresses sequence length using input-sensitive dynamic compression matrices. DBA achieves linear time and space complexity by jointly optimizing sequence length and hidden state dimension while maintaining state-of-the-art performance. Specifically, we demonstrate through experiments and the properties of low-rank matrices that sequence length can be compressed with compression coefficients dynamically determined by the input sequence. In addition, we illustrate that the hidden state dimension can be approximated by extending the Johnson-Lindenstrauss lemma, thereby introducing only a small amount of error. DBA optimizes the attention mechanism through bilinear forms that consider both the sequence length and hidden state dimension. Moreover, the theoretical analysis substantiates that DBA excels at capturing high-order relationships in cross-attention problems. Experimental results across different tasks with varied sequence length conditions demonstrate that DBA achieves state-of-the-art performance compared to several robust baselines. DBA also maintains higher processing speed and lower memory usage, highlighting its efficiency and effectiveness across diverse applications.","PeriodicalId":13303,"journal":{"name":"IEEE transactions on neural networks and learning systems","volume":"36 8","pages":"14493-14507"},"PeriodicalIF":8.9000,"publicationDate":"2025-01-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE transactions on neural networks and learning systems","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/10843139/","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}

引用次数: 0

Abstract

Many studies have aimed to improve Transformer model efficiency using low-rank-based methods that compress sequence length with predetermined or learned compression matrices. However, these methods fix compression coefficients for tokens in the same position during inference, ignoring sequence-specific variations. They also overlook the impact of hidden state dimensions on efficiency gains. To address these limitations, we propose dynamic bilinear low-rank attention (DBA), an efficient and effective attention mechanism that compresses sequence length using input-sensitive dynamic compression matrices. DBA achieves linear time and space complexity by jointly optimizing sequence length and hidden state dimension while maintaining state-of-the-art performance. Specifically, we demonstrate through experiments and the properties of low-rank matrices that sequence length can be compressed with compression coefficients dynamically determined by the input sequence. In addition, we illustrate that the hidden state dimension can be approximated by extending the Johnson-Lindenstrauss lemma, thereby introducing only a small amount of error. DBA optimizes the attention mechanism through bilinear forms that consider both the sequence length and hidden state dimension. Moreover, the theoretical analysis substantiates that DBA excels at capturing high-order relationships in cross-attention problems. Experimental results across different tasks with varied sequence length conditions demonstrate that DBA achieves state-of-the-art performance compared to several robust baselines. DBA also maintains higher processing speed and lower memory usage, highlighting its efficiency and effectiveness across diverse applications.

查看原文本刊更多论文

DBA：具有动态双线性低级关注度的高效变压器

许多研究旨在提高Transformer模型的效率，使用基于低秩的方法，使用预定或学习的压缩矩阵压缩序列长度。但是，这些方法在推理过程中将标记固定在相同位置的压缩系数，忽略特定于序列的变化。他们还忽略了隐藏状态维度对效率增益的影响。为了解决这些限制，我们提出了动态双线性低秩注意（DBA），这是一种高效的注意机制，它使用输入敏感的动态压缩矩阵来压缩序列长度。DBA通过联合优化序列长度和隐藏状态维来实现线性时间和空间复杂度，同时保持最先进的性能。具体来说，我们通过实验和低秩矩阵的性质证明了序列长度可以用由输入序列动态确定的压缩系数来压缩。此外，我们说明了隐藏状态维度可以通过扩展Johnson-Lindenstrauss引理来近似，从而只引入少量的误差。DBA通过考虑序列长度和隐藏状态维的双线性形式来优化注意机制。此外，理论分析证实DBA擅长捕捉交叉注意问题中的高阶关系。不同序列长度条件下不同任务的实验结果表明，与几个鲁棒基线相比，DBA实现了最先进的性能。DBA还保持较高的处理速度和较低的内存使用，突出了其在不同应用程序中的效率和有效性。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

IEEE transactions on neural networks and learning systems COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE-COMPUTER SCIENCE, HARDWARE & ARCHITECTURE

CiteScore

23.80

自引率

9.60%

发文量

2102

审稿时长

3-8 weeks

期刊介绍： The focus of IEEE Transactions on Neural Networks and Learning Systems is to present scholarly articles discussing the theory, design, and applications of neural networks as well as other learning systems. The journal primarily highlights technical and scientific research in this domain.