Towards Compact Neural Networks via End-to-End Training: A Bayesian Tensor Approach with Automatic Rank Determination

IF 1.9 Q1 MATHEMATICS, APPLIED
Cole Hawkins, Xing-er Liu, Zheng Zhang
{"title":"Towards Compact Neural Networks via End-to-End Training: A Bayesian Tensor Approach with Automatic Rank Determination","authors":"Cole Hawkins, Xing-er Liu, Zheng Zhang","doi":"10.1137/21m1391444","DOIUrl":null,"url":null,"abstract":"While post-training model compression can greatly reduce the inference cost of a deep neural network, uncompressed training still consumes a huge amount of hardware resources, run-time and energy. It is highly desirable to directly train a compact neural network from scratch with low memory and low computational cost. Low-rank tensor decomposition is one of the most effective approaches to reduce the memory and computing requirements of large-size neural networks. However, directly training a low-rank tensorized neural network is a very challenging task because it is hard to determine a proper tensor rank {\\it a priori}, which controls the model complexity and compression ratio in the training process. This paper presents a novel end-to-end framework for low-rank tensorized training of neural networks. We first develop a flexible Bayesian model that can handle various low-rank tensor formats (e.g., CP, Tucker, tensor train and tensor-train matrix) that compress neural network parameters in training. This model can automatically determine the tensor ranks inside a nonlinear forward model, which is beyond the capability of existing Bayesian tensor methods. We further develop a scalable stochastic variational inference solver to estimate the posterior density of large-scale problems in training. Our work provides the first general-purpose rank-adaptive framework for end-to-end tensorized training. Our numerical results on various neural network architectures show orders-of-magnitude parameter reduction and little accuracy loss (or even better accuracy) in the training process. Specifically, on a very large deep learning recommendation system with over $4.2\\times 10^9$ model parameters, our method can reduce the variables to only $1.6\\times 10^5$ automatically in the training process (i.e., by $2.6\\times 10^4$ times) while achieving almost the same accuracy.","PeriodicalId":74797,"journal":{"name":"SIAM journal on mathematics of data science","volume":"45 1","pages":"46-71"},"PeriodicalIF":1.9000,"publicationDate":"2020-10-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"17","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"SIAM journal on mathematics of data science","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1137/21m1391444","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"MATHEMATICS, APPLIED","Score":null,"Total":0}
引用次数: 17

Abstract

While post-training model compression can greatly reduce the inference cost of a deep neural network, uncompressed training still consumes a huge amount of hardware resources, run-time and energy. It is highly desirable to directly train a compact neural network from scratch with low memory and low computational cost. Low-rank tensor decomposition is one of the most effective approaches to reduce the memory and computing requirements of large-size neural networks. However, directly training a low-rank tensorized neural network is a very challenging task because it is hard to determine a proper tensor rank {\it a priori}, which controls the model complexity and compression ratio in the training process. This paper presents a novel end-to-end framework for low-rank tensorized training of neural networks. We first develop a flexible Bayesian model that can handle various low-rank tensor formats (e.g., CP, Tucker, tensor train and tensor-train matrix) that compress neural network parameters in training. This model can automatically determine the tensor ranks inside a nonlinear forward model, which is beyond the capability of existing Bayesian tensor methods. We further develop a scalable stochastic variational inference solver to estimate the posterior density of large-scale problems in training. Our work provides the first general-purpose rank-adaptive framework for end-to-end tensorized training. Our numerical results on various neural network architectures show orders-of-magnitude parameter reduction and little accuracy loss (or even better accuracy) in the training process. Specifically, on a very large deep learning recommendation system with over $4.2\times 10^9$ model parameters, our method can reduce the variables to only $1.6\times 10^5$ automatically in the training process (i.e., by $2.6\times 10^4$ times) while achieving almost the same accuracy.
基于端到端训练的紧凑神经网络:一种自动排序的贝叶斯张量方法
虽然训练后模型压缩可以大大降低深度神经网络的推理成本,但未压缩的训练仍然消耗大量的硬件资源、运行时间和能量。直接从零开始训练具有低内存和低计算成本的紧凑型神经网络是非常理想的。低秩张量分解是减少大型神经网络内存和计算需求的最有效方法之一。然而,直接训练一个低秩张化神经网络是一项非常具有挑战性的任务,因为很难确定一个合适的张量秩(\it a priori),它在训练过程中控制着模型的复杂度和压缩比。提出了一种新颖的端到端神经网络低秩张化训练框架。我们首先开发了一个灵活的贝叶斯模型,可以处理各种低秩张量格式(例如,CP, Tucker,张量训练和张量训练矩阵),在训练中压缩神经网络参数。该模型可以自动确定非线性正演模型内的张量秩,这是现有贝叶斯张量方法所无法做到的。我们进一步开发了一个可扩展的随机变分推理求解器来估计训练中大规模问题的后验密度。我们的工作为端到端张化训练提供了第一个通用的秩自适应框架。我们在各种神经网络架构上的数值结果表明,在训练过程中参数降低了数量级,精度损失很小(甚至更高)。具体来说,在一个拥有超过4.2\times 10^9$模型参数的非常大的深度学习推荐系统上,我们的方法可以在训练过程中自动将变量减少到只有1.6\times 10^5$(即减少2.6\times 10^4$),同时达到几乎相同的精度。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信