基于视觉词袋的时空三维SIFT和CoHOG描述符串联的人体动作识别

R. Divya Rani, C. J. Prabhakar
{"title":"基于视觉词袋的时空三维SIFT和CoHOG描述符串联的人体动作识别","authors":"R. Divya Rani, C. J. Prabhakar","doi":"10.1109/DISCOVER55800.2022.9974645","DOIUrl":null,"url":null,"abstract":"In this paper, Spatio-Temporal Interest Points (STIPs) based technique is presented to recognize human actions using Bag of Visual Words (BoVW) representation. First, we extract densely sampled local 3-Dimensional Scale Invariant Feature Transform (3D SIFT) and global Co-occurrence Histograms of Oriented Gradients (CoHOG) feature descriptors from input video sequences. The discriminative features are selected by applying the Linear Discriminant Analysis (LDA) dimensionality reduction technique. The optimal features selected from both 3D SIFT and CoHOG features are concatenated to produce a single feature vector. To generate visual vocabulary we used k-means++ clustering. The prominent visual words are considered by the Term-Frequency Inverse-Document Frequency (TF.IDF) weighing scheme and are used to generate histograms. The Support Vector Machine (SVM) classifier is used for action classification. The proposed method is evaluated using two popular human action recognition datasets, such as the KTH dataset, and the Weizmann dataset. The experimental results obtained for our proposed method are compared with the state-of-the-art human action recognition techniques which demonstrate that the proposed method achieves the highest recognition accuracy 98.00% for KTH dataset and 98.7% for Weizmann dataset.","PeriodicalId":264177,"journal":{"name":"2022 International Conference on Distributed Computing, VLSI, Electrical Circuits and Robotics ( DISCOVER)","volume":"7 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-10-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":"{\"title\":\"Human Action Recognition by Concatenation of Spatio-Temporal 3D SIFT and CoHOG Descriptors using Bag of Visual Words\",\"authors\":\"R. Divya Rani, C. J. Prabhakar\",\"doi\":\"10.1109/DISCOVER55800.2022.9974645\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"In this paper, Spatio-Temporal Interest Points (STIPs) based technique is presented to recognize human actions using Bag of Visual Words (BoVW) representation. First, we extract densely sampled local 3-Dimensional Scale Invariant Feature Transform (3D SIFT) and global Co-occurrence Histograms of Oriented Gradients (CoHOG) feature descriptors from input video sequences. The discriminative features are selected by applying the Linear Discriminant Analysis (LDA) dimensionality reduction technique. The optimal features selected from both 3D SIFT and CoHOG features are concatenated to produce a single feature vector. To generate visual vocabulary we used k-means++ clustering. The prominent visual words are considered by the Term-Frequency Inverse-Document Frequency (TF.IDF) weighing scheme and are used to generate histograms. The Support Vector Machine (SVM) classifier is used for action classification. The proposed method is evaluated using two popular human action recognition datasets, such as the KTH dataset, and the Weizmann dataset. The experimental results obtained for our proposed method are compared with the state-of-the-art human action recognition techniques which demonstrate that the proposed method achieves the highest recognition accuracy 98.00% for KTH dataset and 98.7% for Weizmann dataset.\",\"PeriodicalId\":264177,\"journal\":{\"name\":\"2022 International Conference on Distributed Computing, VLSI, Electrical Circuits and Robotics ( DISCOVER)\",\"volume\":\"7 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2022-10-14\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"2\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2022 International Conference on Distributed Computing, VLSI, Electrical Circuits and Robotics ( DISCOVER)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/DISCOVER55800.2022.9974645\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 International Conference on Distributed Computing, VLSI, Electrical Circuits and Robotics ( DISCOVER)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/DISCOVER55800.2022.9974645","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 2

摘要

本文提出了一种基于时空兴趣点(STIPs)的视觉词袋(BoVW)识别技术。首先,从输入视频序列中提取密集采样的局部三维尺度不变特征变换(3D SIFT)和全局共现梯度直方图(CoHOG)特征描述子。采用线性判别分析(LDA)降维技术选择判别特征。将从3D SIFT和CoHOG特征中选择的最优特征连接在一起,生成单个特征向量。为了生成视觉词汇表,我们使用了k-means++聚类。突出的视觉词被词频反文档频率(TF.IDF)加权方案考虑,并用于生成直方图。支持向量机(SVM)分类器用于动作分类。使用两个流行的人类动作识别数据集(如KTH数据集和Weizmann数据集)对所提出的方法进行了评估。实验结果表明,该方法对KTH数据集和Weizmann数据集的识别准确率分别达到了98.00%和98.7%。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Human Action Recognition by Concatenation of Spatio-Temporal 3D SIFT and CoHOG Descriptors using Bag of Visual Words
In this paper, Spatio-Temporal Interest Points (STIPs) based technique is presented to recognize human actions using Bag of Visual Words (BoVW) representation. First, we extract densely sampled local 3-Dimensional Scale Invariant Feature Transform (3D SIFT) and global Co-occurrence Histograms of Oriented Gradients (CoHOG) feature descriptors from input video sequences. The discriminative features are selected by applying the Linear Discriminant Analysis (LDA) dimensionality reduction technique. The optimal features selected from both 3D SIFT and CoHOG features are concatenated to produce a single feature vector. To generate visual vocabulary we used k-means++ clustering. The prominent visual words are considered by the Term-Frequency Inverse-Document Frequency (TF.IDF) weighing scheme and are used to generate histograms. The Support Vector Machine (SVM) classifier is used for action classification. The proposed method is evaluated using two popular human action recognition datasets, such as the KTH dataset, and the Weizmann dataset. The experimental results obtained for our proposed method are compared with the state-of-the-art human action recognition techniques which demonstrate that the proposed method achieves the highest recognition accuracy 98.00% for KTH dataset and 98.7% for Weizmann dataset.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信