Efficient knowledge distillation of teacher model to multiple student models

Thrivikram Gl, Vidya Ganesh, T. Sethuraman, Satheesh K. Perepu
{"title":"Efficient knowledge distillation of teacher model to multiple student models","authors":"Thrivikram Gl, Vidya Ganesh, T. Sethuraman, Satheesh K. Perepu","doi":"10.1109/IAICT52856.2021.9532543","DOIUrl":null,"url":null,"abstract":"Deep learning models are proven to deliver satisfactory results on training a complex non-linear relationship between the set of input features and different task outputs. However, they are memory intensive and require good computational power for both training as well as inferencing. In literature one can find different model compression techniques which enables easy deployment on edge devices. Knowledge distillation is one such approach where the knowledge of complex teacher model is transferred to a lower parameter student model. However, the limitation is that the architecture of the student model should be comparable to the complex teacher model for better knowledge transfer. Due to this limitation, we cannot deploy this student model that learns from a complex and huge teacher on edge devices. In this work, we propose to use a combined student approach wherein different student models learn from a common teacher model. Further, we propose a unique loss function which will train multiple student models simultaneously. An advantage of this approach is that these student models can be as simple as possible when compared with traditional single student model and also the complex teacher model. Finally, we provide an extensive evaluation to prove that our approach can improve the overall accuracy significantly and allow a further compression by 10% when compared with generic model.","PeriodicalId":416542,"journal":{"name":"2021 IEEE International Conference on Industry 4.0, Artificial Intelligence, and Communications Technology (IAICT)","volume":"59 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-07-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 IEEE International Conference on Industry 4.0, Artificial Intelligence, and Communications Technology (IAICT)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/IAICT52856.2021.9532543","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

Deep learning models are proven to deliver satisfactory results on training a complex non-linear relationship between the set of input features and different task outputs. However, they are memory intensive and require good computational power for both training as well as inferencing. In literature one can find different model compression techniques which enables easy deployment on edge devices. Knowledge distillation is one such approach where the knowledge of complex teacher model is transferred to a lower parameter student model. However, the limitation is that the architecture of the student model should be comparable to the complex teacher model for better knowledge transfer. Due to this limitation, we cannot deploy this student model that learns from a complex and huge teacher on edge devices. In this work, we propose to use a combined student approach wherein different student models learn from a common teacher model. Further, we propose a unique loss function which will train multiple student models simultaneously. An advantage of this approach is that these student models can be as simple as possible when compared with traditional single student model and also the complex teacher model. Finally, we provide an extensive evaluation to prove that our approach can improve the overall accuracy significantly and allow a further compression by 10% when compared with generic model.
教师模型到多学生模型的高效知识升华
深度学习模型被证明在训练输入特征集和不同任务输出之间的复杂非线性关系方面提供了令人满意的结果。然而,它们是内存密集型的,并且需要良好的计算能力来进行训练和推理。在文献中,人们可以找到不同的模型压缩技术,这些技术可以在边缘设备上轻松部署。知识蒸馏就是将复杂的教师模型中的知识转移到低参数的学生模型中的一种方法。然而,限制是学生模型的架构应该与复杂的教师模型相比较,以便更好地进行知识转移。由于这个限制,我们无法在边缘设备上部署这个从复杂而庞大的老师那里学习的学生模型。在这项工作中,我们建议使用一种组合的学生方法,其中不同的学生模型从一个共同的教师模型中学习。此外,我们提出了一个独特的损失函数,可以同时训练多个学生模型。这种方法的一个优点是,与传统的单一学生模型和复杂的教师模型相比,这些学生模型可以尽可能地简单。最后,我们提供了一个广泛的评估,以证明我们的方法可以显着提高整体精度,并且与通用模型相比,可以进一步压缩10%。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信