Differentiable Slimming for Memory-Efficient Transformers

IF 1.7 4区 计算机科学 Q3 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE
Nikolay Penkov;Konstantinos Balaskas;Martin Rapp;Joerg Henkel
{"title":"Differentiable Slimming for Memory-Efficient Transformers","authors":"Nikolay Penkov;Konstantinos Balaskas;Martin Rapp;Joerg Henkel","doi":"10.1109/LES.2023.3299638","DOIUrl":null,"url":null,"abstract":"Transformer models are continuously achieving state-of-the-art performance on a wide range of benchmarks. To meet demanding performance targets, the number of model parameters is continuously increased. As a result, state-of-the-art Transformers require substantial computational resources prohibiting their deployment on consumer-grade hardware. In the literature, overparameterized Transformers are successfully reduced in size with the help of pruning strategies. Existing works lack the ability to optimize the full architecture, without incurring significant overheads, in a fully differentiable manner. Our work proposes a single-stage approach for training a Transformer for memory-efficient inference and various resource-constrained scenarios. Transformer blocks are extended with trainable gate parameters, which attribute importance and control information flow. Their integration into a differentiable pruning-aware training scheme allows the extraction of extremely sparse subnetworks at runtime, with minimal performance degradation. Evaluative pruning results, at the attention head and layer levels, illustrate the memory efficiency of our trained subnetworks under various memory budgets.","PeriodicalId":56143,"journal":{"name":"IEEE Embedded Systems Letters","volume":"15 4","pages":"186-189"},"PeriodicalIF":1.7000,"publicationDate":"2023-09-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Embedded Systems Letters","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/10261943/","RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"COMPUTER SCIENCE, HARDWARE & ARCHITECTURE","Score":null,"Total":0}
引用次数: 0

Abstract

Transformer models are continuously achieving state-of-the-art performance on a wide range of benchmarks. To meet demanding performance targets, the number of model parameters is continuously increased. As a result, state-of-the-art Transformers require substantial computational resources prohibiting their deployment on consumer-grade hardware. In the literature, overparameterized Transformers are successfully reduced in size with the help of pruning strategies. Existing works lack the ability to optimize the full architecture, without incurring significant overheads, in a fully differentiable manner. Our work proposes a single-stage approach for training a Transformer for memory-efficient inference and various resource-constrained scenarios. Transformer blocks are extended with trainable gate parameters, which attribute importance and control information flow. Their integration into a differentiable pruning-aware training scheme allows the extraction of extremely sparse subnetworks at runtime, with minimal performance degradation. Evaluative pruning results, at the attention head and layer levels, illustrate the memory efficiency of our trained subnetworks under various memory budgets.
内存高效变压器的可微分瘦身
变压器模型在广泛的基准测试中不断实现最先进的性能。为了满足苛刻的性能目标,模型参数的数量不断增加。因此,最先进的变形金刚需要大量的计算资源,因此无法在消费级硬件上部署它们。在文献中,过度参数化的变压器在修剪策略的帮助下成功地减小了尺寸。现有的工作缺乏以完全可微分的方式在不产生重大开销的情况下优化整个架构的能力。我们的工作提出了一种单阶段方法,用于训练Transformer进行内存效率推断和各种资源约束场景。变压器块扩展为可训练的栅极参数,这些栅极参数具有重要属性并控制信息流。它们集成到一个可微分的修剪感知训练方案中,允许在运行时以最小的性能下降提取极其稀疏的子网络。在注意头和层级别上的评估剪枝结果说明了我们训练的子网络在不同内存预算下的内存效率。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
IEEE Embedded Systems Letters
IEEE Embedded Systems Letters Engineering-Control and Systems Engineering
CiteScore
3.30
自引率
0.00%
发文量
65
期刊介绍: The IEEE Embedded Systems Letters (ESL), provides a forum for rapid dissemination of latest technical advances in embedded systems and related areas in embedded software. The emphasis is on models, methods, and tools that ensure secure, correct, efficient and robust design of embedded systems and their applications.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信