{"title":"动作的傅里叶形状频率词","authors":"Bishwajit Sharma, K. Venkatesh, A. Mukerjee","doi":"10.1109/ICIIP.2011.6108939","DOIUrl":null,"url":null,"abstract":"Actions consist of short shape-motion fragments which recur in a seemingly unique sequence. We propose that these short fragments may constitute a concise vocabulary for actions. Models based on such “words” sometimes use the bag of words paradigm, which ignores sequence information. Also, despite the well-known utility of Fourier and similar features for temporal modelling, Fourier models have not received due attention to model action words until recently. Hence, we employ shape-frequency features as a temporally windowed Fourier transform to capture local motion and shape information. Unsupervised clustering discovers the naturally occurring modes (words) of these features. Each labelled video can thus be represented as a sequence of cluster transitions. Though different actions share common words, we observe that the word sequences are different for different actions, enabling easy discrimination. We evaluate the model on the Weizmann action dataset [1] and achieve 96.7% classification accuracy, and show how it compares to other similar algorithms.","PeriodicalId":201779,"journal":{"name":"2011 International Conference on Image Information Processing","volume":"12 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2011-12-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"Fourier shape-frequency words for actions\",\"authors\":\"Bishwajit Sharma, K. Venkatesh, A. Mukerjee\",\"doi\":\"10.1109/ICIIP.2011.6108939\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Actions consist of short shape-motion fragments which recur in a seemingly unique sequence. We propose that these short fragments may constitute a concise vocabulary for actions. Models based on such “words” sometimes use the bag of words paradigm, which ignores sequence information. Also, despite the well-known utility of Fourier and similar features for temporal modelling, Fourier models have not received due attention to model action words until recently. Hence, we employ shape-frequency features as a temporally windowed Fourier transform to capture local motion and shape information. Unsupervised clustering discovers the naturally occurring modes (words) of these features. Each labelled video can thus be represented as a sequence of cluster transitions. Though different actions share common words, we observe that the word sequences are different for different actions, enabling easy discrimination. We evaluate the model on the Weizmann action dataset [1] and achieve 96.7% classification accuracy, and show how it compares to other similar algorithms.\",\"PeriodicalId\":201779,\"journal\":{\"name\":\"2011 International Conference on Image Information Processing\",\"volume\":\"12 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2011-12-22\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2011 International Conference on Image Information Processing\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICIIP.2011.6108939\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2011 International Conference on Image Information Processing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICIIP.2011.6108939","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Actions consist of short shape-motion fragments which recur in a seemingly unique sequence. We propose that these short fragments may constitute a concise vocabulary for actions. Models based on such “words” sometimes use the bag of words paradigm, which ignores sequence information. Also, despite the well-known utility of Fourier and similar features for temporal modelling, Fourier models have not received due attention to model action words until recently. Hence, we employ shape-frequency features as a temporally windowed Fourier transform to capture local motion and shape information. Unsupervised clustering discovers the naturally occurring modes (words) of these features. Each labelled video can thus be represented as a sequence of cluster transitions. Though different actions share common words, we observe that the word sequences are different for different actions, enabling easy discrimination. We evaluate the model on the Weizmann action dataset [1] and achieve 96.7% classification accuracy, and show how it compares to other similar algorithms.