Tongtong Su, Qiyu Liang, Jinsong Zhang, Zhaoyang Yu, Gang Wang, Xiaoguang Liu
{"title":"基于关注的特征交互,实现高效在线知识蒸馏","authors":"Tongtong Su, Qiyu Liang, Jinsong Zhang, Zhaoyang Yu, Gang Wang, Xiaoguang Liu","doi":"10.1109/ICDM51629.2021.00069","DOIUrl":null,"url":null,"abstract":"Existing online knowledge distillation (KD) methods solve the dependency problem of the high-capacity teacher model via mutual learning and ensemble learning. But they focus on the utilization of logits information in the last few layers and fail to construct a strong teacher model to better supervise student networks, leading to the inefficiency of KD. In this work, we propose a simple but effective online knowledge distillation algorithm, called Attentive Feature Interaction Distillation (AFID). It applies interactive teaching in which the teacher and the student can send, receive, and give feedback on an equal footing, ultimately promoting the generality of both. Specifically, we set up a Feature Interaction Module for two sub-networks to conduct low-level and mid-level feature learning. They can alternately transfer attentive features maps to exchange interesting regions and fuse the other party’s map with the features of self-extraction for information enhancement. Besides, we assign a Feature Fusion Module, in which a Peer Fused Teacher is formed to fuse the output features of two sub-networks to guide sub-networks and a Peer Ensemble Teacher is established to accomplish mutual learning between the two teachers. Integrating Feature Interaction Module and Feature Fusion Module into a unified framework takes full advantage of the interactive teaching mechanism and makes the two sub-networks capture and transfer more fine-grained features to each other. Experimental results on CIFAR-100 and ImageNet ILSVRC 2012 real datasets show that AFID achieves significant performance improvements compared with existing online KD and classical teacher-guide methods.","PeriodicalId":320970,"journal":{"name":"2021 IEEE International Conference on Data Mining (ICDM)","volume":"198 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"8","resultStr":"{\"title\":\"Attention-based Feature Interaction for Efficient Online Knowledge Distillation\",\"authors\":\"Tongtong Su, Qiyu Liang, Jinsong Zhang, Zhaoyang Yu, Gang Wang, Xiaoguang Liu\",\"doi\":\"10.1109/ICDM51629.2021.00069\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Existing online knowledge distillation (KD) methods solve the dependency problem of the high-capacity teacher model via mutual learning and ensemble learning. But they focus on the utilization of logits information in the last few layers and fail to construct a strong teacher model to better supervise student networks, leading to the inefficiency of KD. In this work, we propose a simple but effective online knowledge distillation algorithm, called Attentive Feature Interaction Distillation (AFID). It applies interactive teaching in which the teacher and the student can send, receive, and give feedback on an equal footing, ultimately promoting the generality of both. Specifically, we set up a Feature Interaction Module for two sub-networks to conduct low-level and mid-level feature learning. They can alternately transfer attentive features maps to exchange interesting regions and fuse the other party’s map with the features of self-extraction for information enhancement. Besides, we assign a Feature Fusion Module, in which a Peer Fused Teacher is formed to fuse the output features of two sub-networks to guide sub-networks and a Peer Ensemble Teacher is established to accomplish mutual learning between the two teachers. Integrating Feature Interaction Module and Feature Fusion Module into a unified framework takes full advantage of the interactive teaching mechanism and makes the two sub-networks capture and transfer more fine-grained features to each other. Experimental results on CIFAR-100 and ImageNet ILSVRC 2012 real datasets show that AFID achieves significant performance improvements compared with existing online KD and classical teacher-guide methods.\",\"PeriodicalId\":320970,\"journal\":{\"name\":\"2021 IEEE International Conference on Data Mining (ICDM)\",\"volume\":\"198 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2021-12-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"8\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2021 IEEE International Conference on Data Mining (ICDM)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICDM51629.2021.00069\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 IEEE International Conference on Data Mining (ICDM)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICDM51629.2021.00069","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Attention-based Feature Interaction for Efficient Online Knowledge Distillation
Existing online knowledge distillation (KD) methods solve the dependency problem of the high-capacity teacher model via mutual learning and ensemble learning. But they focus on the utilization of logits information in the last few layers and fail to construct a strong teacher model to better supervise student networks, leading to the inefficiency of KD. In this work, we propose a simple but effective online knowledge distillation algorithm, called Attentive Feature Interaction Distillation (AFID). It applies interactive teaching in which the teacher and the student can send, receive, and give feedback on an equal footing, ultimately promoting the generality of both. Specifically, we set up a Feature Interaction Module for two sub-networks to conduct low-level and mid-level feature learning. They can alternately transfer attentive features maps to exchange interesting regions and fuse the other party’s map with the features of self-extraction for information enhancement. Besides, we assign a Feature Fusion Module, in which a Peer Fused Teacher is formed to fuse the output features of two sub-networks to guide sub-networks and a Peer Ensemble Teacher is established to accomplish mutual learning between the two teachers. Integrating Feature Interaction Module and Feature Fusion Module into a unified framework takes full advantage of the interactive teaching mechanism and makes the two sub-networks capture and transfer more fine-grained features to each other. Experimental results on CIFAR-100 and ImageNet ILSVRC 2012 real datasets show that AFID achieves significant performance improvements compared with existing online KD and classical teacher-guide methods.