M2A:精确视频动作识别的动作感知注意力

2022 19th Conference on Robots and Vision (CRV) Pub Date : 2021-11-18 DOI:10.1109/CRV55824.2022.00019

Brennan Gebotys, Alexander Wong, David A Clausi

{"title":"M2A:精确视频动作识别的动作感知注意力","authors":"Brennan Gebotys, Alexander Wong, David A Clausi","doi":"10.1109/CRV55824.2022.00019","DOIUrl":null,"url":null,"abstract":"Advancements in attention mechanisms have led to significant performance improvements in a variety of areas in machine learning due to its ability to enable the dynamic modeling of temporal sequences. A particular area in computer vision that is likely to benefit greatly from the incorporation of attention mechanisms in video action recognition. However, much of the current research's focus on attention mechanisms have been on spatial and temporal attention, which are unable to take advantage of the inherent motion found in videos. Motivated by this, we develop a new attention mechanism called Motion Aware Attention (M2A) that explicitly incorporates motion characteris-tics. More specifically, M2A extracts motion information between consecutive frames and utilizes attention to focus on the motion patterns found across frames to accurately recognize actions in videos. The proposed M2A mechanism is simple to implement and can be easily incorporated into any neural network backbone architecture. We show that incorporating motion mechanisms with attention mechanisms using the proposed M2A mechanism can lead to a $+15\\%$ to $+26\\%$ improvement in top-1 accuracy across different backbone architectures, with only a small in-crease in computational complexity. We further compared the performance of M2A with other state-of-the-art motion and at-tention mechanisms on the Something-Something V1 video action recognition benchmark. Experimental results showed that M2A can lead to further improvements when combined with other temporal mechanisms and that it outperforms other motion-only or attention-only mechanisms by as much as $+60\\%$ in top-1 accuracy for specific classes in the benchmark. We make our code available at: https://github.com/gebob19/M2A.","PeriodicalId":131142,"journal":{"name":"2022 19th Conference on Robots and Vision (CRV)","volume":"32 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-11-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":"{\"title\":\"M2A: Motion Aware Attention for Accurate Video Action Recognition\",\"authors\":\"Brennan Gebotys, Alexander Wong, David A Clausi\",\"doi\":\"10.1109/CRV55824.2022.00019\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Advancements in attention mechanisms have led to significant performance improvements in a variety of areas in machine learning due to its ability to enable the dynamic modeling of temporal sequences. A particular area in computer vision that is likely to benefit greatly from the incorporation of attention mechanisms in video action recognition. However, much of the current research's focus on attention mechanisms have been on spatial and temporal attention, which are unable to take advantage of the inherent motion found in videos. Motivated by this, we develop a new attention mechanism called Motion Aware Attention (M2A) that explicitly incorporates motion characteris-tics. More specifically, M2A extracts motion information between consecutive frames and utilizes attention to focus on the motion patterns found across frames to accurately recognize actions in videos. The proposed M2A mechanism is simple to implement and can be easily incorporated into any neural network backbone architecture. We show that incorporating motion mechanisms with attention mechanisms using the proposed M2A mechanism can lead to a $+15\\\\%$ to $+26\\\\%$ improvement in top-1 accuracy across different backbone architectures, with only a small in-crease in computational complexity. We further compared the performance of M2A with other state-of-the-art motion and at-tention mechanisms on the Something-Something V1 video action recognition benchmark. Experimental results showed that M2A can lead to further improvements when combined with other temporal mechanisms and that it outperforms other motion-only or attention-only mechanisms by as much as $+60\\\\%$ in top-1 accuracy for specific classes in the benchmark. We make our code available at: https://github.com/gebob19/M2A.\",\"PeriodicalId\":131142,\"journal\":{\"name\":\"2022 19th Conference on Robots and Vision (CRV)\",\"volume\":\"32 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2021-11-18\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"2\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2022 19th Conference on Robots and Vision (CRV)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/CRV55824.2022.00019\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 19th Conference on Robots and Vision (CRV)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/CRV55824.2022.00019","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 2

摘要

由于注意机制能够实现时间序列的动态建模，因此在机器学习的各个领域中，注意机制的进步导致了显著的性能改进。计算机视觉中的一个特殊领域，可能会从视频动作识别中加入注意机制中获益良多。然而，目前对注意力机制的研究大多集中在空间和时间注意力上，这无法利用视频中固有的运动。受此启发，我们开发了一种新的注意力机制，称为运动感知注意力(M2A)，明确地结合了运动特征。更具体地说，M2A提取连续帧之间的运动信息，并利用注意力集中在跨帧发现的运动模式上，以准确识别视频中的动作。提出的M2A机制实现简单，可以很容易地集成到任何神经网络骨干架构中。我们表明，使用所提出的M2A机制将运动机制与注意力机制结合起来，可以在不同的骨干架构中导致top-1精度的+ 15%到+ 26%的提高，而计算复杂性只有很小的增加。我们进一步在Something-Something V1视频动作识别基准上比较了M2A与其他最先进的运动和注意力机制的性能。实验结果表明，当与其他时间机制相结合时，M2A可以带来进一步的改进，并且在基准测试中的特定类别中，它比其他仅运动或仅注意机制在前1的准确率上高出高达60%。我们在https://github.com/gebob19/M2A上提供了我们的代码。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

M2A: Motion Aware Attention for Accurate Video Action Recognition

Advancements in attention mechanisms have led to significant performance improvements in a variety of areas in machine learning due to its ability to enable the dynamic modeling of temporal sequences. A particular area in computer vision that is likely to benefit greatly from the incorporation of attention mechanisms in video action recognition. However, much of the current research's focus on attention mechanisms have been on spatial and temporal attention, which are unable to take advantage of the inherent motion found in videos. Motivated by this, we develop a new attention mechanism called Motion Aware Attention (M2A) that explicitly incorporates motion characteris-tics. More specifically, M2A extracts motion information between consecutive frames and utilizes attention to focus on the motion patterns found across frames to accurately recognize actions in videos. The proposed M2A mechanism is simple to implement and can be easily incorporated into any neural network backbone architecture. We show that incorporating motion mechanisms with attention mechanisms using the proposed M2A mechanism can lead to a $+15\%$ to $+26\%$ improvement in top-1 accuracy across different backbone architectures, with only a small in-crease in computational complexity. We further compared the performance of M2A with other state-of-the-art motion and at-tention mechanisms on the Something-Something V1 video action recognition benchmark. Experimental results showed that M2A can lead to further improvements when combined with other temporal mechanisms and that it outperforms other motion-only or attention-only mechanisms by as much as $+60\%$ in top-1 accuracy for specific classes in the benchmark. We make our code available at: https://github.com/gebob19/M2A.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2022 19th Conference on Robots and Vision (CRV)

自引率

0.00%

发文量