Attention-based Feature Interaction for Efficient Online Knowledge Distillation

2021 IEEE International Conference on Data Mining (ICDM) Pub Date : 2021-12-01 DOI:10.1109/ICDM51629.2021.00069

Tongtong Su, Qiyu Liang, Jinsong Zhang, Zhaoyang Yu, Gang Wang, Xiaoguang Liu

{"title":"Attention-based Feature Interaction for Efficient Online Knowledge Distillation","authors":"Tongtong Su, Qiyu Liang, Jinsong Zhang, Zhaoyang Yu, Gang Wang, Xiaoguang Liu","doi":"10.1109/ICDM51629.2021.00069","DOIUrl":null,"url":null,"abstract":"Existing online knowledge distillation (KD) methods solve the dependency problem of the high-capacity teacher model via mutual learning and ensemble learning. But they focus on the utilization of logits information in the last few layers and fail to construct a strong teacher model to better supervise student networks, leading to the inefficiency of KD. In this work, we propose a simple but effective online knowledge distillation algorithm, called Attentive Feature Interaction Distillation (AFID). It applies interactive teaching in which the teacher and the student can send, receive, and give feedback on an equal footing, ultimately promoting the generality of both. Specifically, we set up a Feature Interaction Module for two sub-networks to conduct low-level and mid-level feature learning. They can alternately transfer attentive features maps to exchange interesting regions and fuse the other party’s map with the features of self-extraction for information enhancement. Besides, we assign a Feature Fusion Module, in which a Peer Fused Teacher is formed to fuse the output features of two sub-networks to guide sub-networks and a Peer Ensemble Teacher is established to accomplish mutual learning between the two teachers. Integrating Feature Interaction Module and Feature Fusion Module into a unified framework takes full advantage of the interactive teaching mechanism and makes the two sub-networks capture and transfer more fine-grained features to each other. Experimental results on CIFAR-100 and ImageNet ILSVRC 2012 real datasets show that AFID achieves significant performance improvements compared with existing online KD and classical teacher-guide methods.","PeriodicalId":320970,"journal":{"name":"2021 IEEE International Conference on Data Mining (ICDM)","volume":"198 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"8","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 IEEE International Conference on Data Mining (ICDM)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICDM51629.2021.00069","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 8

Abstract

Existing online knowledge distillation (KD) methods solve the dependency problem of the high-capacity teacher model via mutual learning and ensemble learning. But they focus on the utilization of logits information in the last few layers and fail to construct a strong teacher model to better supervise student networks, leading to the inefficiency of KD. In this work, we propose a simple but effective online knowledge distillation algorithm, called Attentive Feature Interaction Distillation (AFID). It applies interactive teaching in which the teacher and the student can send, receive, and give feedback on an equal footing, ultimately promoting the generality of both. Specifically, we set up a Feature Interaction Module for two sub-networks to conduct low-level and mid-level feature learning. They can alternately transfer attentive features maps to exchange interesting regions and fuse the other party’s map with the features of self-extraction for information enhancement. Besides, we assign a Feature Fusion Module, in which a Peer Fused Teacher is formed to fuse the output features of two sub-networks to guide sub-networks and a Peer Ensemble Teacher is established to accomplish mutual learning between the two teachers. Integrating Feature Interaction Module and Feature Fusion Module into a unified framework takes full advantage of the interactive teaching mechanism and makes the two sub-networks capture and transfer more fine-grained features to each other. Experimental results on CIFAR-100 and ImageNet ILSVRC 2012 real datasets show that AFID achieves significant performance improvements compared with existing online KD and classical teacher-guide methods.

查看原文本刊更多论文

基于关注的特征交互，实现高效在线知识蒸馏

现有的在线知识蒸馏(KD)方法通过相互学习和集成学习来解决高容量教师模型的依赖问题。但他们关注的是最后几层logits信息的利用，未能构建一个强大的教师模型来更好地监督学生网络，导致KD效率低下。在这项工作中，我们提出了一种简单而有效的在线知识蒸馏算法，称为关注特征交互蒸馏(AFID)。它采用互动式教学，教师和学生可以在平等的基础上发送、接收和给出反馈，最终促进两者的普遍性。具体而言，我们建立了两个子网的特征交互模块，进行低级和中级特征学习。他们可以交替地将关注的特征地图转移到交换感兴趣的区域，并将对方的地图与自提取的特征融合在一起进行信息增强。此外，我们还分配了一个特征融合模块，其中形成Peer Fused Teacher，融合两个子网的输出特征来指导子网，建立Peer Ensemble Teacher，实现两个教师之间的相互学习。将特征交互模块和特征融合模块集成到一个统一的框架中，充分利用了交互教学机制，使两个子网能够相互捕获和传递更多细粒度的特征。在CIFAR-100和ImageNet ILSVRC 2012真实数据集上的实验结果表明，与现有的在线KD和经典的教师引导方法相比，AFID的性能有显著提高。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2021 IEEE International Conference on Data Mining (ICDM)

自引率

0.00%

发文量