基于视频提议和轨迹的动作识别

Proceedings of the 2nd International Conference on Vision, Image and Signal Processing Pub Date : 2018-08-27 DOI:10.1145/3271553.3271563

Lei Qi, Xiaoqiang Lu, Xuelong Li

{"title":"基于视频提议和轨迹的动作识别","authors":"Lei Qi, Xiaoqiang Lu, Xuelong Li","doi":"10.1145/3271553.3271563","DOIUrl":null,"url":null,"abstract":"As a popular research field in computer vision community, human action recognition in videos is a challenging task. In recent years, trajectory based methods have been proven effective for action recognition. However, because trajectory is generated around motion region, trajectory based methods often only pay attention to regions with high motion salience in video and ignore motionless but semantic objects. To compensate the shortage of trajectory based methods, video proposal is utilized for its ability to discover semantic object in this paper. In the proposed method, video proposal and trajectory are extracted simultaneously to capture motion information and object information. The proposed method can be divided into three steps: 1) trajectories and video proposals are extracted from video to capture motion information and object information respectively; 2) a trained Convolution Neural Network (CNN) model is employed to describe the extracted trajectories and video proposals; 3) the holistic representation of video is constructed by Fisher Vector model and then input to classifier to get the action label. The complementarity between trajectory and video proposal enables the discrimination power of the proposed method for kinds of actions. The proposed method is evaluated on UCF101 and HMDB51, on which the promising results prove the effectiveness of the proposed method.","PeriodicalId":414782,"journal":{"name":"Proceedings of the 2nd International Conference on Vision, Image and Signal Processing","volume":"9 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2018-08-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"5","resultStr":"{\"title\":\"Action Recognition by Jointly Using Video Proposal and Trajectory\",\"authors\":\"Lei Qi, Xiaoqiang Lu, Xuelong Li\",\"doi\":\"10.1145/3271553.3271563\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"As a popular research field in computer vision community, human action recognition in videos is a challenging task. In recent years, trajectory based methods have been proven effective for action recognition. However, because trajectory is generated around motion region, trajectory based methods often only pay attention to regions with high motion salience in video and ignore motionless but semantic objects. To compensate the shortage of trajectory based methods, video proposal is utilized for its ability to discover semantic object in this paper. In the proposed method, video proposal and trajectory are extracted simultaneously to capture motion information and object information. The proposed method can be divided into three steps: 1) trajectories and video proposals are extracted from video to capture motion information and object information respectively; 2) a trained Convolution Neural Network (CNN) model is employed to describe the extracted trajectories and video proposals; 3) the holistic representation of video is constructed by Fisher Vector model and then input to classifier to get the action label. The complementarity between trajectory and video proposal enables the discrimination power of the proposed method for kinds of actions. The proposed method is evaluated on UCF101 and HMDB51, on which the promising results prove the effectiveness of the proposed method.\",\"PeriodicalId\":414782,\"journal\":{\"name\":\"Proceedings of the 2nd International Conference on Vision, Image and Signal Processing\",\"volume\":\"9 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2018-08-27\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"5\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the 2nd International Conference on Vision, Image and Signal Processing\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/3271553.3271563\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 2nd International Conference on Vision, Image and Signal Processing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3271553.3271563","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 5

摘要

视频中的人体动作识别作为计算机视觉领域的一个热门研究领域，是一项具有挑战性的任务。近年来，基于轨迹的方法已被证明是有效的动作识别方法。然而，由于轨迹是围绕运动区域生成的，基于轨迹的方法往往只关注视频中运动显著性高的区域，而忽略了静止但有语义的对象。为了弥补基于轨迹的方法的不足，本文利用视频建议发现语义对象的能力。在该方法中，视频建议和轨迹同时提取，以捕获运动信息和目标信息。该方法分为三个步骤:1)从视频中提取轨迹和视频建议，分别捕获运动信息和目标信息;2)使用训练好的卷积神经网络(CNN)模型描述提取的轨迹和视频建议;3)利用Fisher向量模型构造视频的整体表示，然后输入到分类器中得到动作标签。轨迹和视频建议之间的互补性使所提方法具有对各种动作的识别能力。在UCF101和HMDB51上对该方法进行了测试，结果表明该方法是有效的。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Action Recognition by Jointly Using Video Proposal and Trajectory

As a popular research field in computer vision community, human action recognition in videos is a challenging task. In recent years, trajectory based methods have been proven effective for action recognition. However, because trajectory is generated around motion region, trajectory based methods often only pay attention to regions with high motion salience in video and ignore motionless but semantic objects. To compensate the shortage of trajectory based methods, video proposal is utilized for its ability to discover semantic object in this paper. In the proposed method, video proposal and trajectory are extracted simultaneously to capture motion information and object information. The proposed method can be divided into three steps: 1) trajectories and video proposals are extracted from video to capture motion information and object information respectively; 2) a trained Convolution Neural Network (CNN) model is employed to describe the extracted trajectories and video proposals; 3) the holistic representation of video is constructed by Fisher Vector model and then input to classifier to get the action label. The complementarity between trajectory and video proposal enables the discrimination power of the proposed method for kinds of actions. The proposed method is evaluated on UCF101 and HMDB51, on which the promising results prove the effectiveness of the proposed method.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Proceedings of the 2nd International Conference on Vision, Image and Signal Processing

自引率

0.00%

发文量