从多个教师网络中学习

Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining Pub Date : 2017-08-04 DOI:10.1145/3097983.3098135

Shan You, Chang Xu, Chao Xu, D. Tao

{"title":"从多个教师网络中学习","authors":"Shan You, Chang Xu, Chao Xu, D. Tao","doi":"10.1145/3097983.3098135","DOIUrl":null,"url":null,"abstract":"Training thin deep networks following the student-teacher learning paradigm has received intensive attention because of its excellent performance. However, to the best of our knowledge, most existing work mainly considers one single teacher network. In practice, a student may access multiple teachers, and multiple teacher networks together provide comprehensive guidance that is beneficial for training the student network. In this paper, we present a method to train a thin deep network by incorporating multiple teacher networks not only in output layer by averaging the softened outputs (dark knowledge) from different networks, but also in the intermediate layers by imposing a constraint about the dissimilarity among examples. We suggest that the relative dissimilarity between intermediate representations of different examples serves as a more flexible and appropriate guidance from teacher networks. Then triplets are utilized to encourage the consistence of these relative dissimilarity relationships between the student network and teacher networks. Moreover, we leverage a voting strategy to unify multiple relative dissimilarity information provided by multiple teacher networks, which realizes their incorporation in the intermediate layers. Extensive experimental results demonstrated that our method is capable of generating a well-performed student network, with the classification accuracy comparable or even superior to all teacher networks, yet having much fewer parameters and being much faster in running.","PeriodicalId":314049,"journal":{"name":"Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining","volume":"13 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2017-08-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"266","resultStr":"{\"title\":\"Learning from Multiple Teacher Networks\",\"authors\":\"Shan You, Chang Xu, Chao Xu, D. Tao\",\"doi\":\"10.1145/3097983.3098135\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Training thin deep networks following the student-teacher learning paradigm has received intensive attention because of its excellent performance. However, to the best of our knowledge, most existing work mainly considers one single teacher network. In practice, a student may access multiple teachers, and multiple teacher networks together provide comprehensive guidance that is beneficial for training the student network. In this paper, we present a method to train a thin deep network by incorporating multiple teacher networks not only in output layer by averaging the softened outputs (dark knowledge) from different networks, but also in the intermediate layers by imposing a constraint about the dissimilarity among examples. We suggest that the relative dissimilarity between intermediate representations of different examples serves as a more flexible and appropriate guidance from teacher networks. Then triplets are utilized to encourage the consistence of these relative dissimilarity relationships between the student network and teacher networks. Moreover, we leverage a voting strategy to unify multiple relative dissimilarity information provided by multiple teacher networks, which realizes their incorporation in the intermediate layers. Extensive experimental results demonstrated that our method is capable of generating a well-performed student network, with the classification accuracy comparable or even superior to all teacher networks, yet having much fewer parameters and being much faster in running.\",\"PeriodicalId\":314049,\"journal\":{\"name\":\"Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining\",\"volume\":\"13 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2017-08-04\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"266\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/3097983.3098135\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3097983.3098135","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 266

摘要

基于师生学习模式的深度网络训练因其优异的性能而受到广泛关注。然而，据我们所知，大多数现有的工作主要是考虑一个单一的教师网络。在实践中，一个学生可以访问多个教师，多个教师网络共同提供全面的指导，有利于培养学生网络。在本文中，我们提出了一种方法，通过结合多个教师网络来训练一个薄深度网络，不仅在输出层通过平均来自不同网络的软化输出(暗知识)，而且在中间层通过施加关于示例之间不相似性的约束。我们认为，不同示例的中间表示之间的相对差异可以作为教师网络更灵活和适当的指导。然后使用三元组来鼓励学生网络和教师网络之间这些相对不相似关系的一致性。此外，我们利用投票策略将多个教师网络提供的多个相对不相似信息统一起来，实现了它们在中间层的整合。大量的实验结果表明，我们的方法能够生成一个性能良好的学生网络，其分类精度与所有教师网络相当甚至更好，但参数少得多，运行速度快得多。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Learning from Multiple Teacher Networks

Training thin deep networks following the student-teacher learning paradigm has received intensive attention because of its excellent performance. However, to the best of our knowledge, most existing work mainly considers one single teacher network. In practice, a student may access multiple teachers, and multiple teacher networks together provide comprehensive guidance that is beneficial for training the student network. In this paper, we present a method to train a thin deep network by incorporating multiple teacher networks not only in output layer by averaging the softened outputs (dark knowledge) from different networks, but also in the intermediate layers by imposing a constraint about the dissimilarity among examples. We suggest that the relative dissimilarity between intermediate representations of different examples serves as a more flexible and appropriate guidance from teacher networks. Then triplets are utilized to encourage the consistence of these relative dissimilarity relationships between the student network and teacher networks. Moreover, we leverage a voting strategy to unify multiple relative dissimilarity information provided by multiple teacher networks, which realizes their incorporation in the intermediate layers. Extensive experimental results demonstrated that our method is capable of generating a well-performed student network, with the classification accuracy comparable or even superior to all teacher networks, yet having much fewer parameters and being much faster in running.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining

自引率

0.00%

发文量