Accelerating Convolutional Neural Networks with Dynamic Channel Pruning

Chiliang Zhang, Tao Hu, Yingda Guan, Zuochang Ye
{"title":"Accelerating Convolutional Neural Networks with Dynamic Channel Pruning","authors":"Chiliang Zhang, Tao Hu, Yingda Guan, Zuochang Ye","doi":"10.1109/DCC.2019.00075","DOIUrl":null,"url":null,"abstract":"Network acceleration has become a hot topic, for the substantial challenge in deploying such networks in real-time applications or on resource-limited devices. A wide variety of pruning-based acceleration methods were proposed to expend the sparsity of parameters, thus omit computations involving those pruned parameters. However, these element-wise pruning methods can hardly be efficiently used for accelerating without special-customized speed-up algorithms. Due to this difficulty, recent work has turned to prune filters or channels instead, which directly reduce the number of matrix multiplications. While Channel Pruning method reforms the original CNNs to a kernel-wisely or channel-wisely pruned one, Runtime Neural Pruning (RNP) argues that models pruned with static pruning methods will lose the ability for some hard tasks since some potentially significant weights are lost during the pruning process. Dynamically pruning the channels is found to be a good solution. In this paper, we propose to use Channel Threshold-Weighting (T-Weighting) modules to choose and prune unimportant feature channels at inference phase. As the pruning is done dynamically, it is called Dynamic Channel Pruning (DCP). DCP consists of the original convolutional network and a number of \"Channel T-Weighting\" modules at certain layers. The \"Channel T-Weighting\" module assigns weights to corresponding channels, pruning those channels whose weights are zero. Those pruned channels make the CNN accelerated, and those remained channels multiplying with weights help feature expression enhanced. The reason for not considering fully-connected layers are two-fold: 1. convolution operations occupying the vast majority of all computation cost. 2. DCP is not designed only for classification, but for many tasks taking CNN as their backbone networks. In this work, we propose as a specific choice for h(·) the thresholded sigmoid function to offer sparsity to w_l, called thresholded sigmoid (T-sigmoid), h(x) = σ(x)· 1{x > T}, where σ(·) refers to sigmoid function. 1{x} is boolean indicator function, where output being 1 when input x is True, and vice versa. The T-sigmoid function is inspired by spike-and-slab models, which formulates distributions over hidden variables as the product of a binary spike variable and a real-valued code. The DCP is trained in a layer-by-layer manner. We first train the \"Channel T-Weighting\" module, and then set the threshold based on the given pruned ratio, and adjust the threshold in an iterative way at the end. The proposed DCP could reach 5× speed-up with only 4.77% drops on ILSVRC2012 dataset. Comparing the increasing error with baseline methods (Filter Pruning, Channel Pruning and RNP), DCP outperforms other methods consistently as the speed-up ratio increasing. The experiment show that DCP also consistently outperforms the baseline model whenever for Cifar10 and Cifar100. By comparing the full model and accelerated model (3×), we can see that DCP generalized well on scenes classification task (on the Places365-Challenge dataset) with VGG-16, with the top-1 accuracy top-5 accuracy dropping 2.07% and 1.96% respectively. DCP (3×) trained with ResNet-50 also suffered slight drops, with the top-1 accuracy top-5 accuracy dropping 2.78% and 2.55% respectively, outperforming Channel Pruning (our impl.) by a large margin. For the detection task on the PASCAL VOC2007 dataset using Faster R-CNN, we observe 0.5% mAP drops and 1.7% mAP drops of our 2× acceleration model and 4× acceleration model respectively, showing little accuracy degradation, showing a competitive result for proving DCP generalized well on detection task.","PeriodicalId":167723,"journal":{"name":"2019 Data Compression Conference (DCC)","volume":"47 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"10","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2019 Data Compression Conference (DCC)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/DCC.2019.00075","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 10

Abstract

Network acceleration has become a hot topic, for the substantial challenge in deploying such networks in real-time applications or on resource-limited devices. A wide variety of pruning-based acceleration methods were proposed to expend the sparsity of parameters, thus omit computations involving those pruned parameters. However, these element-wise pruning methods can hardly be efficiently used for accelerating without special-customized speed-up algorithms. Due to this difficulty, recent work has turned to prune filters or channels instead, which directly reduce the number of matrix multiplications. While Channel Pruning method reforms the original CNNs to a kernel-wisely or channel-wisely pruned one, Runtime Neural Pruning (RNP) argues that models pruned with static pruning methods will lose the ability for some hard tasks since some potentially significant weights are lost during the pruning process. Dynamically pruning the channels is found to be a good solution. In this paper, we propose to use Channel Threshold-Weighting (T-Weighting) modules to choose and prune unimportant feature channels at inference phase. As the pruning is done dynamically, it is called Dynamic Channel Pruning (DCP). DCP consists of the original convolutional network and a number of "Channel T-Weighting" modules at certain layers. The "Channel T-Weighting" module assigns weights to corresponding channels, pruning those channels whose weights are zero. Those pruned channels make the CNN accelerated, and those remained channels multiplying with weights help feature expression enhanced. The reason for not considering fully-connected layers are two-fold: 1. convolution operations occupying the vast majority of all computation cost. 2. DCP is not designed only for classification, but for many tasks taking CNN as their backbone networks. In this work, we propose as a specific choice for h(·) the thresholded sigmoid function to offer sparsity to w_l, called thresholded sigmoid (T-sigmoid), h(x) = σ(x)· 1{x > T}, where σ(·) refers to sigmoid function. 1{x} is boolean indicator function, where output being 1 when input x is True, and vice versa. The T-sigmoid function is inspired by spike-and-slab models, which formulates distributions over hidden variables as the product of a binary spike variable and a real-valued code. The DCP is trained in a layer-by-layer manner. We first train the "Channel T-Weighting" module, and then set the threshold based on the given pruned ratio, and adjust the threshold in an iterative way at the end. The proposed DCP could reach 5× speed-up with only 4.77% drops on ILSVRC2012 dataset. Comparing the increasing error with baseline methods (Filter Pruning, Channel Pruning and RNP), DCP outperforms other methods consistently as the speed-up ratio increasing. The experiment show that DCP also consistently outperforms the baseline model whenever for Cifar10 and Cifar100. By comparing the full model and accelerated model (3×), we can see that DCP generalized well on scenes classification task (on the Places365-Challenge dataset) with VGG-16, with the top-1 accuracy top-5 accuracy dropping 2.07% and 1.96% respectively. DCP (3×) trained with ResNet-50 also suffered slight drops, with the top-1 accuracy top-5 accuracy dropping 2.78% and 2.55% respectively, outperforming Channel Pruning (our impl.) by a large margin. For the detection task on the PASCAL VOC2007 dataset using Faster R-CNN, we observe 0.5% mAP drops and 1.7% mAP drops of our 2× acceleration model and 4× acceleration model respectively, showing little accuracy degradation, showing a competitive result for proving DCP generalized well on detection task.
基于动态通道剪枝的卷积神经网络加速
网络加速已经成为一个热门话题,因为在实时应用程序或资源有限的设备上部署这样的网络面临着巨大的挑战。提出了多种基于剪枝的加速方法,以扩大参数的稀疏性,从而省去了涉及这些剪枝参数的计算。然而,如果没有专门定制的加速算法,这些元素修剪方法很难有效地用于加速。由于这个困难,最近的工作转向了修剪滤波器或通道,这直接减少了矩阵乘法的数量。虽然通道修剪方法将原始cnn改造为核明智或通道明智修剪的cnn,但运行时神经修剪(RNP)认为,由于在修剪过程中丢失了一些潜在的重要权值,使用静态修剪方法修剪的模型将失去一些困难任务的能力。动态修剪通道是一个很好的解决方案。在本文中,我们提出使用通道阈值加权(t -加权)模块来选择和修剪推理阶段不重要的特征通道。由于修剪是动态完成的,因此称为动态通道修剪(DCP)。DCP由原始卷积网络和若干特定层的“通道t加权”模块组成。“Channel T-Weighting”模块为相应的通道分配权重,对权重为零的通道进行剪枝。那些被修剪的通道使CNN加速,而那些被保留的与权重相乘的通道有助于增强特征表达。不考虑完全连接层的原因有两个:1。卷积运算占据了所有计算量的绝大部分。2. DCP不仅仅是为分类而设计的,它也适用于许多以CNN为骨干网络的任务。在这项工作中,我们提出h(·)作为阈值sigmoid函数的特定选择,以提供对w_l的稀疏性,称为阈值sigmoid (T-sigmoid), h(x) = σ(x)·1{x > T},其中σ(·)指sigmoid函数。1{x}是布尔指示器函数,当输入x为True时输出为1,反之亦然。T-sigmoid函数的灵感来自于spike-and-slab模型,该模型将隐藏变量的分布表述为二进制spike变量和实值代码的乘积。DCP以一层一层的方式进行训练。我们首先训练“通道t加权”模块,然后根据给定的剪枝比设置阈值,最后以迭代的方式调整阈值。在ILSVRC2012数据集上,DCP可以达到5倍的加速,仅下降4.77%。与基线方法(Filter Pruning、Channel Pruning和RNP)相比,随着加速比的增加,DCP的性能始终优于其他方法。实验表明,无论何时对于Cifar10和Cifar100, DCP也始终优于基线模型。通过完整模型和加速模型(3倍)的对比,我们可以看到DCP在场景分类任务(Places365-Challenge数据集)上与VGG-16进行了很好的广义化,前1的准确率和前5的准确率分别下降了2.07%和1.96%。用ResNet-50训练的DCP (3x)也有轻微的下降,前1名的准确率和前5名的准确率分别下降了2.78%和2.55%,大大超过了Channel Pruning(我们的impl.)。对于使用Faster R-CNN在PASCAL VOC2007数据集上的检测任务,我们分别观察到我们的2倍加速度模型和4倍加速度模型的mAP下降0.5%和1.7%,精度下降很小,在检测任务上证明了DCP泛化的良好结果。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信