{"title":"Recognizing actions using salient features","authors":"Liang Wang, Debin Zhao","doi":"10.1109/MMSP.2011.6093832","DOIUrl":null,"url":null,"abstract":"Towards a compact video feature representation, we propose a novel feature selection methodology for action recognition based on the saliency maps of videos. Since saliency maps measure the perceptual importance of the pixels and regions in videos, selecting features using saliency maps enables us to find a feature representation that covers the informative parts of a video. Because saliency detection is a bottom-up procedure, some appearance changes or motions that are irrelevant to actions may also be detected as salient regions. To further improve the purity of the feature representation, we prune these irrelevant salient regions using the saliency values distribution and the spatial-temporal distribution of the salient regions. Extensive experiments are conducted to demonstrate that the proposed feature selection method largely improves the performance of bag-of-video-words model on action recognition based on three different attention models including a static attention model, a motion attention model and their combination.","PeriodicalId":214459,"journal":{"name":"2011 IEEE 13th International Workshop on Multimedia Signal Processing","volume":"65 7 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2011-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"6","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2011 IEEE 13th International Workshop on Multimedia Signal Processing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/MMSP.2011.6093832","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 6
Abstract
Towards a compact video feature representation, we propose a novel feature selection methodology for action recognition based on the saliency maps of videos. Since saliency maps measure the perceptual importance of the pixels and regions in videos, selecting features using saliency maps enables us to find a feature representation that covers the informative parts of a video. Because saliency detection is a bottom-up procedure, some appearance changes or motions that are irrelevant to actions may also be detected as salient regions. To further improve the purity of the feature representation, we prune these irrelevant salient regions using the saliency values distribution and the spatial-temporal distribution of the salient regions. Extensive experiments are conducted to demonstrate that the proposed feature selection method largely improves the performance of bag-of-video-words model on action recognition based on three different attention models including a static attention model, a motion attention model and their combination.