Efficient Action Recognition via Dynamic Knowledge Propagation

2021 IEEE/CVF International Conference on Computer Vision (ICCV) Pub Date : 2021-10-01 DOI:10.1109/ICCV48922.2021.01346

Hanul Kim, Mihir Jain, Jun-Tae Lee, Sungrack Yun, F. Porikli

{"title":"Efficient Action Recognition via Dynamic Knowledge Propagation","authors":"Hanul Kim, Mihir Jain, Jun-Tae Lee, Sungrack Yun, F. Porikli","doi":"10.1109/ICCV48922.2021.01346","DOIUrl":null,"url":null,"abstract":"Efficient action recognition has become crucial to extend the success of action recognition to many real-world applications. Contrary to most existing methods, which mainly focus on selecting salient frames to reduce the computation cost, we focus more on making the most of the selected frames. To this end, we employ two networks of different capabilities that operate in tandem to efficiently recognize actions. Given a video, the lighter network processes more frames while the heavier one only processes a few. In order to enable the effective interaction between the two, we propose dynamic knowledge propagation based on a cross-attention mechanism. This is the main component of our framework that is essentially a student-teacher architecture, but as the teacher model continues to interact with the student model during inference, we call it a dynamic student-teacher framework. Through extensive experiments, we demonstrate the effectiveness of each component of our framework. Our method outperforms competing state-of-the-art methods on two video datasets: ActivityNet-v1.3 and Mini-Kinetics.","PeriodicalId":6820,"journal":{"name":"2021 IEEE/CVF International Conference on Computer Vision (ICCV)","volume":"42 2 1","pages":"13699-13708"},"PeriodicalIF":0.0000,"publicationDate":"2021-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"15","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 IEEE/CVF International Conference on Computer Vision (ICCV)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICCV48922.2021.01346","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 15

Abstract

Efficient action recognition has become crucial to extend the success of action recognition to many real-world applications. Contrary to most existing methods, which mainly focus on selecting salient frames to reduce the computation cost, we focus more on making the most of the selected frames. To this end, we employ two networks of different capabilities that operate in tandem to efficiently recognize actions. Given a video, the lighter network processes more frames while the heavier one only processes a few. In order to enable the effective interaction between the two, we propose dynamic knowledge propagation based on a cross-attention mechanism. This is the main component of our framework that is essentially a student-teacher architecture, but as the teacher model continues to interact with the student model during inference, we call it a dynamic student-teacher framework. Through extensive experiments, we demonstrate the effectiveness of each component of our framework. Our method outperforms competing state-of-the-art methods on two video datasets: ActivityNet-v1.3 and Mini-Kinetics.

查看原文本刊更多论文

基于动态知识传播的高效动作识别

高效的动作识别已成为将动作识别成功扩展到许多实际应用的关键。与大多数现有方法主要侧重于选择显著帧以减少计算成本不同，我们更侧重于充分利用所选帧。为此，我们采用了两个不同功能的网络，它们串联运行以有效地识别动作。给定一个视频，较轻的网络处理更多帧，而较重的网络只处理少数帧。为了实现两者之间的有效交互，我们提出了基于交叉注意机制的动态知识传播。这是我们的框架的主要组成部分，本质上是一个学生-教师架构，但是由于教师模型在推理过程中继续与学生模型交互，我们称之为动态学生-教师框架。通过大量的实验，我们证明了框架的每个组成部分的有效性。我们的方法在两个视频数据集(ActivityNet-v1.3和Mini-Kinetics)上优于最先进的竞争方法。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2021 IEEE/CVF International Conference on Computer Vision (ICCV)

自引率

0.00%

发文量