{"title":"Recognizing Actions in Wearable-Camera Videos by Training Classifiers on Fixed-Camera Videos","authors":"Yang Mi, Kang Zheng, Song Wang","doi":"10.1145/3206025.3206041","DOIUrl":null,"url":null,"abstract":"Recognizing human actions in wearable camera videos, such as videos taken by GoPro or Google Glass, can benefit many multimedia applications. By mixing the complex and non-stop motion of the camera, motion features extracted from videos of the same action may show very large variation and inconsistency. It is very difficult to collect sufficient videos to cover all such variations and use them to train action classifiers with good generalization ability. In this paper, we develop a new approach to train action classifiers on a relatively smaller set of fixed-camera videos with different views, and then apply them to recognize actions in wearable-camera videos. In this approach, we temporally divide the input video into many shorter video segments and transform the motion features to stable ones in each video segment, in terms of a fixed view defined by an anchor frame in the segment. Finally, we use sparse coding to estimate the action likelihood in each segment, followed by combining the likelihoods from all the video segments for action recognition. We conduct experiments by training on a set of fixed-camera videos and testing on a set of wearable-camera videos, with very promising results.","PeriodicalId":224132,"journal":{"name":"Proceedings of the 2018 ACM on International Conference on Multimedia Retrieval","volume":"16 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2018-06-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"6","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 2018 ACM on International Conference on Multimedia Retrieval","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3206025.3206041","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 6
Abstract
Recognizing human actions in wearable camera videos, such as videos taken by GoPro or Google Glass, can benefit many multimedia applications. By mixing the complex and non-stop motion of the camera, motion features extracted from videos of the same action may show very large variation and inconsistency. It is very difficult to collect sufficient videos to cover all such variations and use them to train action classifiers with good generalization ability. In this paper, we develop a new approach to train action classifiers on a relatively smaller set of fixed-camera videos with different views, and then apply them to recognize actions in wearable-camera videos. In this approach, we temporally divide the input video into many shorter video segments and transform the motion features to stable ones in each video segment, in terms of a fixed view defined by an anchor frame in the segment. Finally, we use sparse coding to estimate the action likelihood in each segment, followed by combining the likelihoods from all the video segments for action recognition. We conduct experiments by training on a set of fixed-camera videos and testing on a set of wearable-camera videos, with very promising results.