TPrune

IF 2.9 Q3 COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS

ACM Transactions on Cyber-Physical Systems Pub Date : 2021-04-15 DOI:10.1145/3446640

Jiachen Mao, Huanrui Yang, Ang Li, H. Li, Yiran Chen

{"title":"TPrune","authors":"Jiachen Mao, Huanrui Yang, Ang Li, H. Li, Yiran Chen","doi":"10.1145/3446640","DOIUrl":null,"url":null,"abstract":"The invention of Transformer model structure boosts the performance of Neural Machine Translation (NMT) tasks to an unprecedented level. Many previous works have been done to make the Transformer model more execution-friendly on resource-constrained platforms. These researches can be categorized into three key fields: Model Pruning, Transfer Learning, and Efficient Transformer Variants. The family of model pruning methods are popular for their simplicity in practice and promising compression rate and have achieved great success in the field of convolution neural networks (CNNs) for many vision tasks. Nonetheless, previous Transformer pruning works did not perform a thorough model analysis and evaluation on each Transformer component on off-the-shelf mobile devices. In this work, we analyze and prune transformer models at the line-wise granularity and also implement our pruning method on real mobile platforms. We explore the properties of all Transformer components as well as their sparsity features, which are leveraged to guide Transformer model pruning. We name our whole Transformer analysis and pruning pipeline as TPrune. In TPrune, we first propose Block-wise Structured Sparsity Learning (BSSL) to analyze Transformer model property. Then, based on the characters derived from BSSL, we apply Structured Hoyer Square (SHS) to derive the final pruned models. Comparing with the state-of-the-art Transformer pruning methods, TPrune is able to achieve a higher model compression rate with less performance degradation. Experimental results show that our pruned models achieve 1.16×–1.92× speedup on mobile devices with 0%–8% BLEU score degradation compared with the original Transformer model.","PeriodicalId":7055,"journal":{"name":"ACM Transactions on Cyber-Physical Systems","volume":"13 1","pages":"1 - 22"},"PeriodicalIF":2.9000,"publicationDate":"2021-04-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1145/3446640","citationCount":"21","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"ACM Transactions on Cyber-Physical Systems","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3446640","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS","Score":null,"Total":0}

引用次数: 21

Abstract

The invention of Transformer model structure boosts the performance of Neural Machine Translation (NMT) tasks to an unprecedented level. Many previous works have been done to make the Transformer model more execution-friendly on resource-constrained platforms. These researches can be categorized into three key fields: Model Pruning, Transfer Learning, and Efficient Transformer Variants. The family of model pruning methods are popular for their simplicity in practice and promising compression rate and have achieved great success in the field of convolution neural networks (CNNs) for many vision tasks. Nonetheless, previous Transformer pruning works did not perform a thorough model analysis and evaluation on each Transformer component on off-the-shelf mobile devices. In this work, we analyze and prune transformer models at the line-wise granularity and also implement our pruning method on real mobile platforms. We explore the properties of all Transformer components as well as their sparsity features, which are leveraged to guide Transformer model pruning. We name our whole Transformer analysis and pruning pipeline as TPrune. In TPrune, we first propose Block-wise Structured Sparsity Learning (BSSL) to analyze Transformer model property. Then, based on the characters derived from BSSL, we apply Structured Hoyer Square (SHS) to derive the final pruned models. Comparing with the state-of-the-art Transformer pruning methods, TPrune is able to achieve a higher model compression rate with less performance degradation. Experimental results show that our pruned models achieve 1.16×–1.92× speedup on mobile devices with 0%–8% BLEU score degradation compared with the original Transformer model.

查看原文本刊更多论文

TPrune

Transformer模型结构的发明将神经机器翻译(NMT)任务的性能提升到前所未有的水平。之前的许多工作都是为了使Transformer模型在资源受限的平台上更加易于执行。这些研究可以分为三个关键领域:模型修剪、迁移学习和高效变压器变体。模型修剪方法家族以其简单易行和良好的压缩率而广受欢迎，并在卷积神经网络(cnn)领域的许多视觉任务中取得了巨大成功。尽管如此，以前的Transformer修剪工作并没有对现成移动设备上的每个Transformer组件执行彻底的模型分析和评估。在这项工作中，我们以逐行粒度分析和修剪变压器模型，并在实际的移动平台上实现了我们的修剪方法。我们探索了所有Transformer组件的属性以及它们的稀疏性特征，这些特性被用来指导Transformer模型修剪。我们将整个变压器分析和修剪管道命名为TPrune。在TPrune中，我们首次提出了基于块的结构化稀疏学习(BSSL)来分析Transformer模型的属性。然后，基于BSSL衍生的特征，应用结构化霍耶平方(SHS)得到最终的剪枝模型。与最先进的变压器修剪方法相比，TPrune能够实现更高的模型压缩率和更小的性能下降。实验结果表明，与原始Transformer模型相比，我们的剪叶模型在移动设备上的加速速度提高了1.16×-1.92×， BLEU分数下降了0%-8%。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

ACM Transactions on Cyber-Physical Systems COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS-

CiteScore

5.70

自引率

4.30%

发文量