{"title":"Human Action Recognition by Concatenation of Spatio-Temporal 3D SIFT and CoHOG Descriptors using Bag of Visual Words","authors":"R. Divya Rani, C. J. Prabhakar","doi":"10.1109/DISCOVER55800.2022.9974645","DOIUrl":null,"url":null,"abstract":"In this paper, Spatio-Temporal Interest Points (STIPs) based technique is presented to recognize human actions using Bag of Visual Words (BoVW) representation. First, we extract densely sampled local 3-Dimensional Scale Invariant Feature Transform (3D SIFT) and global Co-occurrence Histograms of Oriented Gradients (CoHOG) feature descriptors from input video sequences. The discriminative features are selected by applying the Linear Discriminant Analysis (LDA) dimensionality reduction technique. The optimal features selected from both 3D SIFT and CoHOG features are concatenated to produce a single feature vector. To generate visual vocabulary we used k-means++ clustering. The prominent visual words are considered by the Term-Frequency Inverse-Document Frequency (TF.IDF) weighing scheme and are used to generate histograms. The Support Vector Machine (SVM) classifier is used for action classification. The proposed method is evaluated using two popular human action recognition datasets, such as the KTH dataset, and the Weizmann dataset. The experimental results obtained for our proposed method are compared with the state-of-the-art human action recognition techniques which demonstrate that the proposed method achieves the highest recognition accuracy 98.00% for KTH dataset and 98.7% for Weizmann dataset.","PeriodicalId":264177,"journal":{"name":"2022 International Conference on Distributed Computing, VLSI, Electrical Circuits and Robotics ( DISCOVER)","volume":"7 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-10-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 International Conference on Distributed Computing, VLSI, Electrical Circuits and Robotics ( DISCOVER)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/DISCOVER55800.2022.9974645","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 2
Abstract
In this paper, Spatio-Temporal Interest Points (STIPs) based technique is presented to recognize human actions using Bag of Visual Words (BoVW) representation. First, we extract densely sampled local 3-Dimensional Scale Invariant Feature Transform (3D SIFT) and global Co-occurrence Histograms of Oriented Gradients (CoHOG) feature descriptors from input video sequences. The discriminative features are selected by applying the Linear Discriminant Analysis (LDA) dimensionality reduction technique. The optimal features selected from both 3D SIFT and CoHOG features are concatenated to produce a single feature vector. To generate visual vocabulary we used k-means++ clustering. The prominent visual words are considered by the Term-Frequency Inverse-Document Frequency (TF.IDF) weighing scheme and are used to generate histograms. The Support Vector Machine (SVM) classifier is used for action classification. The proposed method is evaluated using two popular human action recognition datasets, such as the KTH dataset, and the Weizmann dataset. The experimental results obtained for our proposed method are compared with the state-of-the-art human action recognition techniques which demonstrate that the proposed method achieves the highest recognition accuracy 98.00% for KTH dataset and 98.7% for Weizmann dataset.