{"title":"温度项在深度神经网络训练中的探索","authors":"Zhaofeng Si, H. Qi","doi":"10.1109/AVSS.2019.8909875","DOIUrl":null,"url":null,"abstract":"Model compression technique is now widely investigated to fit the high-complexity deep neural network into resource-constrained mobile devices in recent years, in which one of effective methods is knowledge distillation. In this paper we make a discussion on the temperature term introduced in knowledge distillation method. The temperature term in distill training is aimed at making it easier for the student network to learn the generalization capablityof teacher network by softening the labels from the teacher network. We analyze the situation of using the temperature term in ordinary training to soften the output of neural network instead of soften the target. In experiments, we show that by applying a proper temperature term in training process, a better performance can be gained on NABirds dataset than using the model without temperature term.","PeriodicalId":243194,"journal":{"name":"2019 16th IEEE International Conference on Advanced Video and Signal Based Surveillance (AVSS)","volume":"15 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"An Exploration on Temperature Term in Training Deep Neural Networks\",\"authors\":\"Zhaofeng Si, H. Qi\",\"doi\":\"10.1109/AVSS.2019.8909875\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Model compression technique is now widely investigated to fit the high-complexity deep neural network into resource-constrained mobile devices in recent years, in which one of effective methods is knowledge distillation. In this paper we make a discussion on the temperature term introduced in knowledge distillation method. The temperature term in distill training is aimed at making it easier for the student network to learn the generalization capablityof teacher network by softening the labels from the teacher network. We analyze the situation of using the temperature term in ordinary training to soften the output of neural network instead of soften the target. In experiments, we show that by applying a proper temperature term in training process, a better performance can be gained on NABirds dataset than using the model without temperature term.\",\"PeriodicalId\":243194,\"journal\":{\"name\":\"2019 16th IEEE International Conference on Advanced Video and Signal Based Surveillance (AVSS)\",\"volume\":\"15 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2019-09-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2019 16th IEEE International Conference on Advanced Video and Signal Based Surveillance (AVSS)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/AVSS.2019.8909875\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2019 16th IEEE International Conference on Advanced Video and Signal Based Surveillance (AVSS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/AVSS.2019.8909875","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
An Exploration on Temperature Term in Training Deep Neural Networks
Model compression technique is now widely investigated to fit the high-complexity deep neural network into resource-constrained mobile devices in recent years, in which one of effective methods is knowledge distillation. In this paper we make a discussion on the temperature term introduced in knowledge distillation method. The temperature term in distill training is aimed at making it easier for the student network to learn the generalization capablityof teacher network by softening the labels from the teacher network. We analyze the situation of using the temperature term in ordinary training to soften the output of neural network instead of soften the target. In experiments, we show that by applying a proper temperature term in training process, a better performance can be gained on NABirds dataset than using the model without temperature term.