{"title":"使用深度神经网络学习视频动作识别的深度轨迹描述符","authors":"Yemin Shi, Wei Zeng, Tiejun Huang, Yaowei Wang","doi":"10.1109/ICME.2015.7177461","DOIUrl":null,"url":null,"abstract":"Human action recognition is widely recognized as a challenging task due to the difficulty of effectively characterizing human action in a complex scene. Recent studies have shown that the dense-trajectory-based methods can achieve state-of-the-art recognition results on some challenging datasets. However, in these methods, each dense trajectory is often represented as a vector of coordinates, consequently losing the structural relationship between different trajectories. To address the problem, this paper proposes a novel Deep Trajectory Descriptor (DTD) for action recognition. First, we extract dense trajectories from multiple consecutive frames and then project them onto a canvas. This will result in a “trajectory texture” image which can effectively characterize the relative motion in these frames. Based on these trajectory texture images, a deep neural network (DNN) is utilized to learn a more compact and powerful representation of dense trajectories. In the action recognition system, the DTD descriptor, together with other non-trajectory features such as HOG, HOF and MBH, can provide an effective way to characterize human action from various aspects. Experimental results show that our system can statistically outperform several state-of-the-art approaches, with an average accuracy of 95:6% on KTH and an accuracy of 92.14% on UCF50.","PeriodicalId":146271,"journal":{"name":"2015 IEEE International Conference on Multimedia and Expo (ICME)","volume":"13 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2015-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"36","resultStr":"{\"title\":\"Learning Deep Trajectory Descriptor for action recognition in videos using deep neural networks\",\"authors\":\"Yemin Shi, Wei Zeng, Tiejun Huang, Yaowei Wang\",\"doi\":\"10.1109/ICME.2015.7177461\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Human action recognition is widely recognized as a challenging task due to the difficulty of effectively characterizing human action in a complex scene. Recent studies have shown that the dense-trajectory-based methods can achieve state-of-the-art recognition results on some challenging datasets. However, in these methods, each dense trajectory is often represented as a vector of coordinates, consequently losing the structural relationship between different trajectories. To address the problem, this paper proposes a novel Deep Trajectory Descriptor (DTD) for action recognition. First, we extract dense trajectories from multiple consecutive frames and then project them onto a canvas. This will result in a “trajectory texture” image which can effectively characterize the relative motion in these frames. Based on these trajectory texture images, a deep neural network (DNN) is utilized to learn a more compact and powerful representation of dense trajectories. In the action recognition system, the DTD descriptor, together with other non-trajectory features such as HOG, HOF and MBH, can provide an effective way to characterize human action from various aspects. Experimental results show that our system can statistically outperform several state-of-the-art approaches, with an average accuracy of 95:6% on KTH and an accuracy of 92.14% on UCF50.\",\"PeriodicalId\":146271,\"journal\":{\"name\":\"2015 IEEE International Conference on Multimedia and Expo (ICME)\",\"volume\":\"13 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2015-06-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"36\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2015 IEEE International Conference on Multimedia and Expo (ICME)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICME.2015.7177461\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2015 IEEE International Conference on Multimedia and Expo (ICME)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICME.2015.7177461","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Learning Deep Trajectory Descriptor for action recognition in videos using deep neural networks
Human action recognition is widely recognized as a challenging task due to the difficulty of effectively characterizing human action in a complex scene. Recent studies have shown that the dense-trajectory-based methods can achieve state-of-the-art recognition results on some challenging datasets. However, in these methods, each dense trajectory is often represented as a vector of coordinates, consequently losing the structural relationship between different trajectories. To address the problem, this paper proposes a novel Deep Trajectory Descriptor (DTD) for action recognition. First, we extract dense trajectories from multiple consecutive frames and then project them onto a canvas. This will result in a “trajectory texture” image which can effectively characterize the relative motion in these frames. Based on these trajectory texture images, a deep neural network (DNN) is utilized to learn a more compact and powerful representation of dense trajectories. In the action recognition system, the DTD descriptor, together with other non-trajectory features such as HOG, HOF and MBH, can provide an effective way to characterize human action from various aspects. Experimental results show that our system can statistically outperform several state-of-the-art approaches, with an average accuracy of 95:6% on KTH and an accuracy of 92.14% on UCF50.