重姿态授权RGB网视频动作识别

Song Ren, Meng Ding
{"title":"重姿态授权RGB网视频动作识别","authors":"Song Ren, Meng Ding","doi":"10.1109/ICCECE58074.2023.10135328","DOIUrl":null,"url":null,"abstract":"Recently, works related to video action recognition focus on using hybrid streams as input to get better results. Those streams usually are combinations of RGB channel with one additional feature stream such as audio, optical flow and pose information. Among those extra streams, posture as unstructured data is more difficult to fuse with RGB channel than the others. In this paper, we propose our Heavy Pose Empowered RGB Nets (HPER-Nets) ‐‐an end-to-end multitasking model‐‐based on the thorough investigation on how to fuse posture and RGB information. Given video frames as the only input, our model will reinforce it by merging the intrinsic posture information in the form of part affinity fields (PAFs), and use this hybrid stream to perform further video action recognition. Experimental results show that our model can outperform other different methods on UCF-101, UMDB and Kinetics datasets, and with only 16 frames, a 95.3% Top-1 accuracy on UCF101, a 69.6% on HMDB and a 41.0% on Kinetics have been recorded.","PeriodicalId":120030,"journal":{"name":"2023 3rd International Conference on Consumer Electronics and Computer Engineering (ICCECE)","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2023-01-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Heavy Pose Empowered RGB Nets for Video Action Recognition\",\"authors\":\"Song Ren, Meng Ding\",\"doi\":\"10.1109/ICCECE58074.2023.10135328\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Recently, works related to video action recognition focus on using hybrid streams as input to get better results. Those streams usually are combinations of RGB channel with one additional feature stream such as audio, optical flow and pose information. Among those extra streams, posture as unstructured data is more difficult to fuse with RGB channel than the others. In this paper, we propose our Heavy Pose Empowered RGB Nets (HPER-Nets) ‐‐an end-to-end multitasking model‐‐based on the thorough investigation on how to fuse posture and RGB information. Given video frames as the only input, our model will reinforce it by merging the intrinsic posture information in the form of part affinity fields (PAFs), and use this hybrid stream to perform further video action recognition. Experimental results show that our model can outperform other different methods on UCF-101, UMDB and Kinetics datasets, and with only 16 frames, a 95.3% Top-1 accuracy on UCF101, a 69.6% on HMDB and a 41.0% on Kinetics have been recorded.\",\"PeriodicalId\":120030,\"journal\":{\"name\":\"2023 3rd International Conference on Consumer Electronics and Computer Engineering (ICCECE)\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2023-01-06\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2023 3rd International Conference on Consumer Electronics and Computer Engineering (ICCECE)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICCECE58074.2023.10135328\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2023 3rd International Conference on Consumer Electronics and Computer Engineering (ICCECE)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICCECE58074.2023.10135328","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

摘要

近年来,视频动作识别的研究主要集中在使用混合流作为输入来获得更好的结果。这些流通常是RGB通道与一个额外的特征流(如音频、光流和姿态信息)的组合。在这些额外的流中,姿态作为非结构化数据比其他数据更难与RGB通道融合。在本文中,我们基于如何融合姿态和RGB信息的深入研究,提出了我们的重姿态授权RGB网络(HPER-Nets)——一个端到端多任务模型。给定视频帧作为唯一的输入,我们的模型将通过以部分亲和场(paf)的形式合并固有姿态信息来增强它,并使用这种混合流来执行进一步的视频动作识别。实验结果表明,该模型在UCF-101、UMDB和Kinetics数据集上的表现优于其他不同的方法,仅用16帧,UCF101的Top-1准确率为95.3%,HMDB为69.6%,Kinetics为41.0%。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Heavy Pose Empowered RGB Nets for Video Action Recognition
Recently, works related to video action recognition focus on using hybrid streams as input to get better results. Those streams usually are combinations of RGB channel with one additional feature stream such as audio, optical flow and pose information. Among those extra streams, posture as unstructured data is more difficult to fuse with RGB channel than the others. In this paper, we propose our Heavy Pose Empowered RGB Nets (HPER-Nets) ‐‐an end-to-end multitasking model‐‐based on the thorough investigation on how to fuse posture and RGB information. Given video frames as the only input, our model will reinforce it by merging the intrinsic posture information in the form of part affinity fields (PAFs), and use this hybrid stream to perform further video action recognition. Experimental results show that our model can outperform other different methods on UCF-101, UMDB and Kinetics datasets, and with only 16 frames, a 95.3% Top-1 accuracy on UCF101, a 69.6% on HMDB and a 41.0% on Kinetics have been recorded.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信