基于KL发散和离线集成的跨层知识提取压缩深度神经网络

IF 3.2 Q1 Computer Science
Hsing-Hung Chou, Ching-Te Chiu, Yi-Ping Liao
{"title":"基于KL发散和离线集成的跨层知识提取压缩深度神经网络","authors":"Hsing-Hung Chou, Ching-Te Chiu, Yi-Ping Liao","doi":"10.1017/ATSIP.2021.16","DOIUrl":null,"url":null,"abstract":"Deep neural networks (DNN) have solved many tasks, including image classification, object detection, and semantic segmentation. However, when there are huge parameters and high level of computation associated with a DNN model, it becomes difficult to deploy on mobile devices. To address this difficulty, we propose an efficient compression method that can be split into three parts. First, we propose a cross-layer matrix to extract more features from the teacher's model. Second, we adopt Kullback Leibler (KL) Divergence in an offline environment to make the student model find a wider robust minimum. Finally, we propose the offline ensemble pre-trained teachers to teach a student model. To address dimension mismatch between teacher and student models, we adopt a $1\\times 1$ convolution and two-stage knowledge distillation to release this constraint. We conducted experiments with VGG and ResNet models, using the CIFAR-100 dataset. With VGG-11 as the teacher's model and VGG-6 as the student's model, experimental results showed that the Top-1 accuracy increased by 3.57% with a $2.08\\times$ compression rate and 3.5x computation rate. With ResNet-32 as the teacher's model and ResNet-8 as the student's model, experimental results showed that Top-1 accuracy increased by 4.38% with a $6.11\\times$ compression rate and $5.27\\times$ computation rate. In addition, we conducted experiments using the ImageNet$64\\times 64$ dataset. With MobileNet-16 as the teacher's model and MobileNet-9 as the student's model, experimental results showed that the Top-1 accuracy increased by 3.98% with a $1.59\\times$ compression rate and $2.05\\times$ computation rate.","PeriodicalId":44812,"journal":{"name":"APSIPA Transactions on Signal and Information Processing","volume":null,"pages":null},"PeriodicalIF":3.2000,"publicationDate":"2021-11-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"Cross-layer knowledge distillation with KL divergence and offline ensemble for compressing deep neural network\",\"authors\":\"Hsing-Hung Chou, Ching-Te Chiu, Yi-Ping Liao\",\"doi\":\"10.1017/ATSIP.2021.16\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Deep neural networks (DNN) have solved many tasks, including image classification, object detection, and semantic segmentation. However, when there are huge parameters and high level of computation associated with a DNN model, it becomes difficult to deploy on mobile devices. To address this difficulty, we propose an efficient compression method that can be split into three parts. First, we propose a cross-layer matrix to extract more features from the teacher's model. Second, we adopt Kullback Leibler (KL) Divergence in an offline environment to make the student model find a wider robust minimum. Finally, we propose the offline ensemble pre-trained teachers to teach a student model. To address dimension mismatch between teacher and student models, we adopt a $1\\\\times 1$ convolution and two-stage knowledge distillation to release this constraint. We conducted experiments with VGG and ResNet models, using the CIFAR-100 dataset. With VGG-11 as the teacher's model and VGG-6 as the student's model, experimental results showed that the Top-1 accuracy increased by 3.57% with a $2.08\\\\times$ compression rate and 3.5x computation rate. With ResNet-32 as the teacher's model and ResNet-8 as the student's model, experimental results showed that Top-1 accuracy increased by 4.38% with a $6.11\\\\times$ compression rate and $5.27\\\\times$ computation rate. In addition, we conducted experiments using the ImageNet$64\\\\times 64$ dataset. With MobileNet-16 as the teacher's model and MobileNet-9 as the student's model, experimental results showed that the Top-1 accuracy increased by 3.98% with a $1.59\\\\times$ compression rate and $2.05\\\\times$ computation rate.\",\"PeriodicalId\":44812,\"journal\":{\"name\":\"APSIPA Transactions on Signal and Information Processing\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":3.2000,\"publicationDate\":\"2021-11-17\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"APSIPA Transactions on Signal and Information Processing\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1017/ATSIP.2021.16\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"Computer Science\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"APSIPA Transactions on Signal and Information Processing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1017/ATSIP.2021.16","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"Computer Science","Score":null,"Total":0}
引用次数: 1

摘要

深度神经网络(DNN)已经解决了许多任务,包括图像分类、对象检测和语义分割。然而,当存在与DNN模型相关联的巨大参数和高水平计算时,在移动设备上部署变得困难。为了解决这一困难,我们提出了一种有效的压缩方法,该方法可以分为三部分。首先,我们提出了一个跨层矩阵来从教师模型中提取更多的特征。其次,我们在离线环境中采用Kullback-Leibler(KL)散度,使学生模型找到更宽的鲁棒最小值。最后,我们提出了线下合奏预培训教师的教学模式。为了解决教师和学生模型之间的维度不匹配问题,我们采用$1\times1$卷积和两阶段知识提取来释放这种约束。我们使用CIFAR-100数据集对VGG和ResNet模型进行了实验。以VGG-11为教师模型,VGG-6为学生模型,实验结果表明,Top-1的精度提高了3.57%,压缩率为2.08倍,计算率为3.5倍。以ResNet-32为教师模型,ResNet-8为学生模型,实验结果表明,Top-1的准确率提高了4.38%,压缩率为6.11倍,计算率为5.27倍。此外,我们使用ImageNet$64\times 64$数据集进行了实验。以MobileNet-16为教师模型,MobileNet-9为学生模型,实验结果表明,Top-1的准确率提高了3.98%,压缩率为1.59倍,计算率为2.05倍。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Cross-layer knowledge distillation with KL divergence and offline ensemble for compressing deep neural network
Deep neural networks (DNN) have solved many tasks, including image classification, object detection, and semantic segmentation. However, when there are huge parameters and high level of computation associated with a DNN model, it becomes difficult to deploy on mobile devices. To address this difficulty, we propose an efficient compression method that can be split into three parts. First, we propose a cross-layer matrix to extract more features from the teacher's model. Second, we adopt Kullback Leibler (KL) Divergence in an offline environment to make the student model find a wider robust minimum. Finally, we propose the offline ensemble pre-trained teachers to teach a student model. To address dimension mismatch between teacher and student models, we adopt a $1\times 1$ convolution and two-stage knowledge distillation to release this constraint. We conducted experiments with VGG and ResNet models, using the CIFAR-100 dataset. With VGG-11 as the teacher's model and VGG-6 as the student's model, experimental results showed that the Top-1 accuracy increased by 3.57% with a $2.08\times$ compression rate and 3.5x computation rate. With ResNet-32 as the teacher's model and ResNet-8 as the student's model, experimental results showed that Top-1 accuracy increased by 4.38% with a $6.11\times$ compression rate and $5.27\times$ computation rate. In addition, we conducted experiments using the ImageNet$64\times 64$ dataset. With MobileNet-16 as the teacher's model and MobileNet-9 as the student's model, experimental results showed that the Top-1 accuracy increased by 3.98% with a $1.59\times$ compression rate and $2.05\times$ computation rate.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
APSIPA Transactions on Signal and Information Processing
APSIPA Transactions on Signal and Information Processing ENGINEERING, ELECTRICAL & ELECTRONIC-
CiteScore
8.60
自引率
6.20%
发文量
30
审稿时长
40 weeks
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信