Network Specialization via Feature-level Knowledge Distillation

Gaowen Liu, Yuzhang Shang, Yuguang Yao, R. Kompella
{"title":"Network Specialization via Feature-level Knowledge Distillation","authors":"Gaowen Liu, Yuzhang Shang, Yuguang Yao, R. Kompella","doi":"10.1109/CVPRW59228.2023.00339","DOIUrl":null,"url":null,"abstract":"State-of-the-art model specialization methods are mainly based on fine-tuning a pre-trained machine learning model to fit the specific needs of a particular task or application. Or by modifying the architecture of the model itself. However, these methods are not preferable in industrial applications because of the model’s large size and the complexity of the training process. In this paper, the difficulty of network specialization is attributed to overfitting caused by a lack of data, and we propose a novel model specialization method by Knowledge Distillation (SKD). The proposed methods merge transfer learning and model compression into one stage. Specifically, we distill and transfer knowledge at the feature map level, circumventing logit-level inconsistency between teacher and student. We empirically investigate and prove the effects of the three parts: Models can be specialized to customer use cases by knowledge distillation. Knowledge distillation can effectively regularize the knowledge transfer process to a smaller, task-specific model. Compared with classical methods such as training a model from scratch and model fine-tuning, our methods achieve comparable and much better results and have better training efficiency on the CIFAR-100 dataset for image classification tasks. This paper proves the great potential of model specialization by knowledge distillation.","PeriodicalId":355438,"journal":{"name":"2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW)","volume":"98 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/CVPRW59228.2023.00339","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

State-of-the-art model specialization methods are mainly based on fine-tuning a pre-trained machine learning model to fit the specific needs of a particular task or application. Or by modifying the architecture of the model itself. However, these methods are not preferable in industrial applications because of the model’s large size and the complexity of the training process. In this paper, the difficulty of network specialization is attributed to overfitting caused by a lack of data, and we propose a novel model specialization method by Knowledge Distillation (SKD). The proposed methods merge transfer learning and model compression into one stage. Specifically, we distill and transfer knowledge at the feature map level, circumventing logit-level inconsistency between teacher and student. We empirically investigate and prove the effects of the three parts: Models can be specialized to customer use cases by knowledge distillation. Knowledge distillation can effectively regularize the knowledge transfer process to a smaller, task-specific model. Compared with classical methods such as training a model from scratch and model fine-tuning, our methods achieve comparable and much better results and have better training efficiency on the CIFAR-100 dataset for image classification tasks. This paper proves the great potential of model specialization by knowledge distillation.
基于特征级知识蒸馏的网络专业化
最先进的模型专业化方法主要基于对预训练的机器学习模型进行微调,以适应特定任务或应用程序的特定需求。或者通过修改模型本身的体系结构。然而,由于模型的大尺寸和训练过程的复杂性,这些方法在工业应用中并不可取。针对网络专门化的困难,提出了一种基于知识精馏(Knowledge Distillation, SKD)的模型专门化方法。该方法将迁移学习和模型压缩合并为一个阶段。具体来说,我们在特征图层面提炼和转移知识,避免了师生之间逻辑层面的不一致。我们通过实证研究证明了这三部分的效果:通过知识蒸馏,模型可以专门用于客户用例。知识蒸馏可以有效地将知识转移过程规范为更小的、特定于任务的模型。与经典的从头开始训练模型和模型微调等方法相比,我们的方法在CIFAR-100数据集的图像分类任务上取得了相当好的结果,并且具有更好的训练效率。通过知识精馏证明了模型专门化的巨大潜力。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信