Wanru Xu, Z. Miao, Jian Zhang, Qiang Zhang, Haohao Wu
{"title":"Spatial-Temporal Context for Action Recognition Combined with Confidence and Contribution Weight","authors":"Wanru Xu, Z. Miao, Jian Zhang, Qiang Zhang, Haohao Wu","doi":"10.1109/ACPR.2013.114","DOIUrl":null,"url":null,"abstract":"In this paper, we propose a new method for human action analysis in videos. A video sequence of human action in our perspective can be modeled through feature distribution over spatial-temporal domain. Relationships between features and each defined action are also explored to form discriminative feature sets. In our work, we first capture contextual correlations between the local features through multiple windows. We then mine confidences from association rules and learn contributions from trained-SVM based on sample videos. Finally, through the analysis of feature distribution and their interactions over spatial-temporal domain, we combine the contexture correlations and the relationships between words and their related actions to derive weights of bag of feature words for action matching. In most of the case, our experiments have indicated that the new method outperforms other previous published results on the Weizmann and KTH datasets.","PeriodicalId":365633,"journal":{"name":"2013 2nd IAPR Asian Conference on Pattern Recognition","volume":"20 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2013-11-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2013 2nd IAPR Asian Conference on Pattern Recognition","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ACPR.2013.114","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1
Abstract
In this paper, we propose a new method for human action analysis in videos. A video sequence of human action in our perspective can be modeled through feature distribution over spatial-temporal domain. Relationships between features and each defined action are also explored to form discriminative feature sets. In our work, we first capture contextual correlations between the local features through multiple windows. We then mine confidences from association rules and learn contributions from trained-SVM based on sample videos. Finally, through the analysis of feature distribution and their interactions over spatial-temporal domain, we combine the contexture correlations and the relationships between words and their related actions to derive weights of bag of feature words for action matching. In most of the case, our experiments have indicated that the new method outperforms other previous published results on the Weizmann and KTH datasets.