Multi-Target Pose Estimation and Behavior Analysis Based on Symmetric Cascaded AdderNet

IF 8.4 1区计算机科学 Q1 COMPUTER SCIENCE, INFORMATION SYSTEMS

IEEE Transactions on Multimedia Pub Date : 2025-04-03 DOI:10.1109/TMM.2025.3557614

Xiaoshuo Jia;Qingzhen Xu;Aiqing Zhu;Xiaomei Kuang

{"title":"Multi-Target Pose Estimation and Behavior Analysis Based on Symmetric Cascaded AdderNet","authors":"Xiaoshuo Jia;Qingzhen Xu;Aiqing Zhu;Xiaomei Kuang","doi":"10.1109/TMM.2025.3557614","DOIUrl":null,"url":null,"abstract":"In the tasks of pose estimation and behavior analysis in computer vision, conventional models are often constrained by various factors or complex environments (such as multiple targets, small targets, occluded targets, etc.). To address this problem, this paper proposes a symmetric cascaded additive network (MulAG) to improve the accuracy of posture estimation and behavior analysis in complex environments. MulAG consists of two modules, MulA and MulG. The MulA module is designed based on a cascaded symmetric network structure and incorporates the addition operation. MulA extracts the posture spatial features of the target from a single frame image. And, the MulG module is designed based on three continuous GRUs (gated recurrent unit). Based on the MulA, MulG extracts the posture temporal features from the posture spatial features of the moving target and predicts the posture temporal features of the moving target. The paper firstly demonstrates the feasibility of addition operations in pose estimation tasks by comparing with MobileNet-v3 in ablation experiments. Secondly, on the HiEve and CrowdPose datasets, MulA achieves accuracy of 79.6% and 80.4%, respectively, outperforming the PTM model by 12.0% and 21.2%. Detection speed of MulA achieves the best value at 8.6 ms, which is 1 times higher than HDGCN. The result demonstrates the effectiveness of MulA in multi-target pose estimation in complex scenes. Finally, on the HDMB-51 and UCF-101 datasets, MulAG achieves accuracy of 74.8% and 86.3%, respectively, outperforming HDGCN by 9.6% and 9.5%. Compared with SKP and GIST, the fps of MulAG (44.8 s<sup>−1</sup>) is improved by 8.2% and 8.9%. These experiments highlight the generalizability and superiority of MulAG in behavior analysis and pose estimation tasks.","PeriodicalId":13273,"journal":{"name":"IEEE Transactions on Multimedia","volume":"27 ","pages":"3197-3209"},"PeriodicalIF":8.4000,"publicationDate":"2025-04-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Multimedia","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/10948348/","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}

引用次数: 0

Abstract

In the tasks of pose estimation and behavior analysis in computer vision, conventional models are often constrained by various factors or complex environments (such as multiple targets, small targets, occluded targets, etc.). To address this problem, this paper proposes a symmetric cascaded additive network (MulAG) to improve the accuracy of posture estimation and behavior analysis in complex environments. MulAG consists of two modules, MulA and MulG. The MulA module is designed based on a cascaded symmetric network structure and incorporates the addition operation. MulA extracts the posture spatial features of the target from a single frame image. And, the MulG module is designed based on three continuous GRUs (gated recurrent unit). Based on the MulA, MulG extracts the posture temporal features from the posture spatial features of the moving target and predicts the posture temporal features of the moving target. The paper firstly demonstrates the feasibility of addition operations in pose estimation tasks by comparing with MobileNet-v3 in ablation experiments. Secondly, on the HiEve and CrowdPose datasets, MulA achieves accuracy of 79.6% and 80.4%, respectively, outperforming the PTM model by 12.0% and 21.2%. Detection speed of MulA achieves the best value at 8.6 ms, which is 1 times higher than HDGCN. The result demonstrates the effectiveness of MulA in multi-target pose estimation in complex scenes. Finally, on the HDMB-51 and UCF-101 datasets, MulAG achieves accuracy of 74.8% and 86.3%, respectively, outperforming HDGCN by 9.6% and 9.5%. Compared with SKP and GIST, the fps of MulAG (44.8 s⁻¹) is improved by 8.2% and 8.9%. These experiments highlight the generalizability and superiority of MulAG in behavior analysis and pose estimation tasks.

查看原文本刊更多论文

基于对称级联AdderNet的多目标姿态估计与行为分析

在计算机视觉的姿态估计和行为分析任务中，常规模型经常受到各种因素或复杂环境（如多目标、小目标、遮挡目标等）的约束。为了解决这一问题，本文提出了一种对称级联加性网络（MulAG），以提高复杂环境下姿态估计和行为分析的准确性。MulAG由MulA和MulG两个模块组成。MulA模块是基于级联对称网络结构设计的，并结合了加法运算。MulA从单帧图像中提取目标的姿态空间特征。MulG模块是基于三个连续gru（门控循环单元）设计的。MulG基于MulA，从运动目标的姿态空间特征中提取姿态时间特征，并对运动目标的姿态时间特征进行预测。本文首先通过与MobileNet-v3在烧蚀实验中的对比，论证了加法运算在位姿估计任务中的可行性。其次，在HiEve和CrowdPose数据集上，MulA的准确率分别达到79.6%和80.4%，分别比PTM模型高12.0%和21.2%。MulA的检测速度在8.6 ms时达到最佳值，是HDGCN的1倍。实验结果证明了MulA算法在复杂场景下多目标姿态估计中的有效性。最后，在HDMB-51和UCF-101数据集上，MulAG的准确率分别达到74.8%和86.3%，比HDGCN高9.6%和9.5%。与SKP和GIST相比，MulAG的fps （44.8 s−1）分别提高了8.2%和8.9%。这些实验突出了MulAG在行为分析和姿态估计任务中的通用性和优越性。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

IEEE Transactions on Multimedia 工程技术-电信学

CiteScore

11.70

自引率

11.00%

发文量

576

审稿时长

5.5 months

期刊介绍： The IEEE Transactions on Multimedia delves into diverse aspects of multimedia technology and applications, covering circuits, networking, signal processing, systems, software, and systems integration. The scope aligns with the Fields of Interest of the sponsors, ensuring a comprehensive exploration of research in multimedia.