{"title":"Knowledge Distillation by Multiple Student Instance Interaction","authors":"Tian Ni, Haoji Hu","doi":"10.1109/prmvia58252.2023.00038","DOIUrl":null,"url":null,"abstract":"Knowledge distillation is an efficient method in neural network compression, which transfers the knowledge from a high-capacity teacher network to a low-capacity student network. Previous approaches follow the ‘one teacher and one student’ paradigm, which neglects the possibility that interaction of multiple students could boost the distillation performance. In this paper, we propose a novel approach by simultaneously training multiple instances of a student model. By adding the similarity and diversity losses into the baseline knowledge distillation and adaptively adjusting the proportion of these losses according to accuracy changes of multiple student instances, we build a distillation system to make students collaborate and compete with each other, which improves system robustness and performance. Experiments show superior performance of the proposed method over existing offline and online distillation schemes on datasets with various scales.","PeriodicalId":221346,"journal":{"name":"2023 International Conference on Pattern Recognition, Machine Vision and Intelligent Algorithms (PRMVIA)","volume":"52 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2023 International Conference on Pattern Recognition, Machine Vision and Intelligent Algorithms (PRMVIA)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/prmvia58252.2023.00038","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Knowledge distillation is an efficient method in neural network compression, which transfers the knowledge from a high-capacity teacher network to a low-capacity student network. Previous approaches follow the ‘one teacher and one student’ paradigm, which neglects the possibility that interaction of multiple students could boost the distillation performance. In this paper, we propose a novel approach by simultaneously training multiple instances of a student model. By adding the similarity and diversity losses into the baseline knowledge distillation and adaptively adjusting the proportion of these losses according to accuracy changes of multiple student instances, we build a distillation system to make students collaborate and compete with each other, which improves system robustness and performance. Experiments show superior performance of the proposed method over existing offline and online distillation schemes on datasets with various scales.