人类动作识别:基于姿势的注意力将焦点吸引到手上

2017 IEEE International Conference on Computer Vision Workshops (ICCVW) Pub Date : 2017-10-23 DOI:10.1109/ICCVW.2017.77

Fabien Baradel, Christian Wolf, J. Mille

{"title":"人类动作识别:基于姿势的注意力将焦点吸引到手上","authors":"Fabien Baradel, Christian Wolf, J. Mille","doi":"10.1109/ICCVW.2017.77","DOIUrl":null,"url":null,"abstract":"We propose a new spatio-temporal attention based mechanism for human action recognition able to automatically attend to most important human hands and detect the most discriminative moments in an action. Attention is handled in a recurrent manner employing Recurrent Neural Network (RNN) and is fully-differentiable. In contrast to standard soft-attention based mechanisms, our approach does not use the hidden RNN state as input to the attention model. Instead, attention distributions are drawn using external information: human articulated pose. We performed an extensive ablation study to show the strengths of this approach and we particularly studied the conditioning aspect of the attention mechanism. We evaluate the method on the largest currently available human action recognition dataset, NTU-RGB+D, and report state-of-the-art results. Another advantage of our model are certains aspects of explanability, as the spatial and temporal attention distributions at test time allow to study and verify on which parts of the input data the method focuses.","PeriodicalId":149766,"journal":{"name":"2017 IEEE International Conference on Computer Vision Workshops (ICCVW)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2017-10-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"91","resultStr":"{\"title\":\"Human Action Recognition: Pose-Based Attention Draws Focus to Hands\",\"authors\":\"Fabien Baradel, Christian Wolf, J. Mille\",\"doi\":\"10.1109/ICCVW.2017.77\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"We propose a new spatio-temporal attention based mechanism for human action recognition able to automatically attend to most important human hands and detect the most discriminative moments in an action. Attention is handled in a recurrent manner employing Recurrent Neural Network (RNN) and is fully-differentiable. In contrast to standard soft-attention based mechanisms, our approach does not use the hidden RNN state as input to the attention model. Instead, attention distributions are drawn using external information: human articulated pose. We performed an extensive ablation study to show the strengths of this approach and we particularly studied the conditioning aspect of the attention mechanism. We evaluate the method on the largest currently available human action recognition dataset, NTU-RGB+D, and report state-of-the-art results. Another advantage of our model are certains aspects of explanability, as the spatial and temporal attention distributions at test time allow to study and verify on which parts of the input data the method focuses.\",\"PeriodicalId\":149766,\"journal\":{\"name\":\"2017 IEEE International Conference on Computer Vision Workshops (ICCVW)\",\"volume\":\"1 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2017-10-23\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"91\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2017 IEEE International Conference on Computer Vision Workshops (ICCVW)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICCVW.2017.77\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2017 IEEE International Conference on Computer Vision Workshops (ICCVW)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICCVW.2017.77","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 91

摘要

我们提出了一种新的基于时空注意的人类动作识别机制，该机制能够自动关注人类最重要的手，并检测动作中最具判别性的时刻。注意力是用循环神经网络(RNN)以循环的方式处理的，并且是完全可微的。与标准的基于软注意的机制相比，我们的方法不使用隐藏的RNN状态作为注意模型的输入。相反，注意力分布是通过外部信息绘制的:人类的关节姿势。我们进行了广泛的消融研究来显示这种方法的优势，我们特别研究了注意机制的条件反射方面。我们在目前最大的人类动作识别数据集NTU-RGB+D上评估了该方法，并报告了最新的结果。我们的模型的另一个优点是某些方面的可解释性，因为测试时的空间和时间注意力分布允许研究和验证该方法关注的输入数据的哪些部分。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Human Action Recognition: Pose-Based Attention Draws Focus to Hands

We propose a new spatio-temporal attention based mechanism for human action recognition able to automatically attend to most important human hands and detect the most discriminative moments in an action. Attention is handled in a recurrent manner employing Recurrent Neural Network (RNN) and is fully-differentiable. In contrast to standard soft-attention based mechanisms, our approach does not use the hidden RNN state as input to the attention model. Instead, attention distributions are drawn using external information: human articulated pose. We performed an extensive ablation study to show the strengths of this approach and we particularly studied the conditioning aspect of the attention mechanism. We evaluate the method on the largest currently available human action recognition dataset, NTU-RGB+D, and report state-of-the-art results. Another advantage of our model are certains aspects of explanability, as the spatial and temporal attention distributions at test time allow to study and verify on which parts of the input data the method focuses.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2017 IEEE International Conference on Computer Vision Workshops (ICCVW)

自引率

0.00%

发文量