Knowledge Distillation via Weighted Ensemble of Teaching Assistants

Durga Prasad Ganta, Himel Das Gupta, Victor S. Sheng
{"title":"Knowledge Distillation via Weighted Ensemble of Teaching Assistants","authors":"Durga Prasad Ganta, Himel Das Gupta, Victor S. Sheng","doi":"10.1109/ICKG52313.2021.00014","DOIUrl":null,"url":null,"abstract":"Knowledge distillation in machine learning is the process of transferring knowledge from a large model called teacher to a smaller model called student. Knowledge distillation is one of the techniques to compress the large network (teacher) to a smaller network (student) that can be deployed in small devices such as mobile phones. When the network size gap between the teacher and student increases, the performance of the student network decreases. To solve this problem, an intermediate model is employed between the teacher model and the student model known as the teaching assistant model, which in turn bridges the gap between the teacher and the student. In this research, we have shown that using multiple teaching assistant models, the student model (the smaller model) can be further improved. We combined these multiple teaching assistant model using weighted ensemble learning where we have used a differential evaluation optimization algorithm to generate the weight values.","PeriodicalId":174126,"journal":{"name":"2021 IEEE International Conference on Big Knowledge (ICBK)","volume":"32 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 IEEE International Conference on Big Knowledge (ICBK)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICKG52313.2021.00014","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1

Abstract

Knowledge distillation in machine learning is the process of transferring knowledge from a large model called teacher to a smaller model called student. Knowledge distillation is one of the techniques to compress the large network (teacher) to a smaller network (student) that can be deployed in small devices such as mobile phones. When the network size gap between the teacher and student increases, the performance of the student network decreases. To solve this problem, an intermediate model is employed between the teacher model and the student model known as the teaching assistant model, which in turn bridges the gap between the teacher and the student. In this research, we have shown that using multiple teaching assistant models, the student model (the smaller model) can be further improved. We combined these multiple teaching assistant model using weighted ensemble learning where we have used a differential evaluation optimization algorithm to generate the weight values.
基于助教加权集合的知识提炼
机器学习中的知识升华是将知识从一个叫做老师的大模型转移到一个叫做学生的小模型的过程。知识蒸馏是将大型网络(教师)压缩为可部署在移动电话等小型设备上的小型网络(学生)的技术之一。当师生之间的网络大小差距增大时,学生网络的性能下降。为了解决这个问题,在教师模型和学生模型之间采用了一种中间模型,即助教模型,它反过来弥合了教师和学生之间的差距。在本研究中,我们已经证明了使用多种助教模型,学生模型(较小的模型)可以进一步改进。我们使用加权集成学习来组合这些多个助教模型,其中我们使用微分评估优化算法来生成权重值。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信