基于KL发散和离线集成的跨层知识提取压缩深度神经网络

IF 3.2 Q1 Computer Science

APSIPA Transactions on Signal and Information Processing Pub Date : 2021-11-17 DOI:10.1017/ATSIP.2021.16

Hsing-Hung Chou, Ching-Te Chiu, Yi-Ping Liao

{"title":"基于KL发散和离线集成的跨层知识提取压缩深度神经网络","authors":"Hsing-Hung Chou, Ching-Te Chiu, Yi-Ping Liao","doi":"10.1017/ATSIP.2021.16","DOIUrl":null,"url":null,"abstract":"Deep neural networks (DNN) have solved many tasks, including image classification, object detection, and semantic segmentation. However, when there are huge parameters and high level of computation associated with a DNN model, it becomes difficult to deploy on mobile devices. To address this difficulty, we propose an efficient compression method that can be split into three parts. First, we propose a cross-layer matrix to extract more features from the teacher's model. Second, we adopt Kullback Leibler (KL) Divergence in an offline environment to make the student model find a wider robust minimum. Finally, we propose the offline ensemble pre-trained teachers to teach a student model. To address dimension mismatch between teacher and student models, we adopt a $1\\times 1$ convolution and two-stage knowledge distillation to release this constraint. We conducted experiments with VGG and ResNet models, using the CIFAR-100 dataset. With VGG-11 as the teacher's model and VGG-6 as the student's model, experimental results showed that the Top-1 accuracy increased by 3.57% with a $2.08\\times$ compression rate and 3.5x computation rate. With ResNet-32 as the teacher's model and ResNet-8 as the student's model, experimental results showed that Top-1 accuracy increased by 4.38% with a $6.11\\times$ compression rate and $5.27\\times$ computation rate. In addition, we conducted experiments using the ImageNet$64\\times 64$ dataset. With MobileNet-16 as the teacher's model and MobileNet-9 as the student's model, experimental results showed that the Top-1 accuracy increased by 3.98% with a $1.59\\times$ compression rate and $2.05\\times$ computation rate.","PeriodicalId":44812,"journal":{"name":"APSIPA Transactions on Signal and Information Processing","volume":" ","pages":""},"PeriodicalIF":3.2000,"publicationDate":"2021-11-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"Cross-layer knowledge distillation with KL divergence and offline ensemble for compressing deep neural network\",\"authors\":\"Hsing-Hung Chou, Ching-Te Chiu, Yi-Ping Liao\",\"doi\":\"10.1017/ATSIP.2021.16\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Deep neural networks (DNN) have solved many tasks, including image classification, object detection, and semantic segmentation. However, when there are huge parameters and high level of computation associated with a DNN model, it becomes difficult to deploy on mobile devices. To address this difficulty, we propose an efficient compression method that can be split into three parts. First, we propose a cross-layer matrix to extract more features from the teacher's model. Second, we adopt Kullback Leibler (KL) Divergence in an offline environment to make the student model find a wider robust minimum. Finally, we propose the offline ensemble pre-trained teachers to teach a student model. To address dimension mismatch between teacher and student models, we adopt a $1\\\\times 1$ convolution and two-stage knowledge distillation to release this constraint. We conducted experiments with VGG and ResNet models, using the CIFAR-100 dataset. With VGG-11 as the teacher's model and VGG-6 as the student's model, experimental results showed that the Top-1 accuracy increased by 3.57% with a $2.08\\\\times$ compression rate and 3.5x computation rate. With ResNet-32 as the teacher's model and ResNet-8 as the student's model, experimental results showed that Top-1 accuracy increased by 4.38% with a $6.11\\\\times$ compression rate and $5.27\\\\times$ computation rate. In addition, we conducted experiments using the ImageNet$64\\\\times 64$ dataset. With MobileNet-16 as the teacher's model and MobileNet-9 as the student's model, experimental results showed that the Top-1 accuracy increased by 3.98% with a $1.59\\\\times$ compression rate and $2.05\\\\times$ computation rate.\",\"PeriodicalId\":44812,\"journal\":{\"name\":\"APSIPA Transactions on Signal and Information Processing\",\"volume\":\" \",\"pages\":\"\"},\"PeriodicalIF\":3.2000,\"publicationDate\":\"2021-11-17\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"APSIPA Transactions on Signal and Information Processing\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1017/ATSIP.2021.16\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"Computer Science\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"APSIPA Transactions on Signal and Information Processing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1017/ATSIP.2021.16","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"Computer Science","Score":null,"Total":0}

引用次数: 1

摘要

深度神经网络（DNN）已经解决了许多任务，包括图像分类、对象检测和语义分割。然而，当存在与DNN模型相关联的巨大参数和高水平计算时，在移动设备上部署变得困难。为了解决这一困难，我们提出了一种有效的压缩方法，该方法可以分为三部分。首先，我们提出了一个跨层矩阵来从教师模型中提取更多的特征。其次，我们在离线环境中采用Kullback-Leibler（KL）散度，使学生模型找到更宽的鲁棒最小值。最后，我们提出了线下合奏预培训教师的教学模式。为了解决教师和学生模型之间的维度不匹配问题，我们采用$1\times1$卷积和两阶段知识提取来释放这种约束。我们使用CIFAR-100数据集对VGG和ResNet模型进行了实验。以VGG-11为教师模型，VGG-6为学生模型，实验结果表明，Top-1的精度提高了3.57%，压缩率为2.08倍，计算率为3.5倍。以ResNet-32为教师模型，ResNet-8为学生模型，实验结果表明，Top-1的准确率提高了4.38%，压缩率为6.11倍，计算率为5.27倍。此外，我们使用ImageNet$64\times 64$数据集进行了实验。以MobileNet-16为教师模型，MobileNet-9为学生模型，实验结果表明，Top-1的准确率提高了3.98%，压缩率为1.59倍，计算率为2.05倍。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Cross-layer knowledge distillation with KL divergence and offline ensemble for compressing deep neural network

Deep neural networks (DNN) have solved many tasks, including image classification, object detection, and semantic segmentation. However, when there are huge parameters and high level of computation associated with a DNN model, it becomes difficult to deploy on mobile devices. To address this difficulty, we propose an efficient compression method that can be split into three parts. First, we propose a cross-layer matrix to extract more features from the teacher's model. Second, we adopt Kullback Leibler (KL) Divergence in an offline environment to make the student model find a wider robust minimum. Finally, we propose the offline ensemble pre-trained teachers to teach a student model. To address dimension mismatch between teacher and student models, we adopt a $1\times 1$ convolution and two-stage knowledge distillation to release this constraint. We conducted experiments with VGG and ResNet models, using the CIFAR-100 dataset. With VGG-11 as the teacher's model and VGG-6 as the student's model, experimental results showed that the Top-1 accuracy increased by 3.57% with a $2.08\times$ compression rate and 3.5x computation rate. With ResNet-32 as the teacher's model and ResNet-8 as the student's model, experimental results showed that Top-1 accuracy increased by 4.38% with a $6.11\times$ compression rate and $5.27\times$ computation rate. In addition, we conducted experiments using the ImageNet$64\times 64$ dataset. With MobileNet-16 as the teacher's model and MobileNet-9 as the student's model, experimental results showed that the Top-1 accuracy increased by 3.98% with a $1.59\times$ compression rate and $2.05\times$ computation rate.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

APSIPA Transactions on Signal and Information Processing ENGINEERING, ELECTRICAL & ELECTRONIC-

CiteScore

8.60

自引率

6.20%

发文量

审稿时长

40 weeks