Lei Wang, Jun Wu, Zhimin Zhou, Yuncai Liu, Xu Zhao
{"title":"Recognition by detection: Perceiving human motion through part-configured feature maps","authors":"Lei Wang, Jun Wu, Zhimin Zhou, Yuncai Liu, Xu Zhao","doi":"10.1109/ICMEW.2014.6890599","DOIUrl":null,"url":null,"abstract":"Visually perceiving human motion at semantic level is an important however challenging problem in multimedia area. In this work, we propose a novel approach to map the low-level responses from visual detection to semantically sensitive description to human actions. The feature map is triggered by the output of deformable part model detection, in which the critical information about body parts configuration is contained implicitly under the specific human actions. We map the filter responses of the detectors to an effective feature description, which encodes the position and appearance information of the root and every body parts simultaneously. Statistically, the obtained feature map captures the significance of relative configuration of body parts, therefore is robust to the false detections occurred in the individual part detectors. We conduct comprehensive experiments and the results show that the method generates discriminative action features and achieves remarkable performance in most of the cases.","PeriodicalId":178700,"journal":{"name":"2014 IEEE International Conference on Multimedia and Expo Workshops (ICMEW)","volume":"15 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2014-07-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2014 IEEE International Conference on Multimedia and Expo Workshops (ICMEW)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICMEW.2014.6890599","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1
Abstract
Visually perceiving human motion at semantic level is an important however challenging problem in multimedia area. In this work, we propose a novel approach to map the low-level responses from visual detection to semantically sensitive description to human actions. The feature map is triggered by the output of deformable part model detection, in which the critical information about body parts configuration is contained implicitly under the specific human actions. We map the filter responses of the detectors to an effective feature description, which encodes the position and appearance information of the root and every body parts simultaneously. Statistically, the obtained feature map captures the significance of relative configuration of body parts, therefore is robust to the false detections occurred in the individual part detectors. We conduct comprehensive experiments and the results show that the method generates discriminative action features and achieves remarkable performance in most of the cases.