拥挤人群姿态估计的分层结构融合变压器

IF 14.7 1区计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Information Fusion Pub Date : 2024-12-20 DOI:10.1016/j.inffus.2024.102878

Muyu Li, Yingfeng Wang, Henan Hu, Xudong Zhao

{"title":"拥挤人群姿态估计的分层结构融合变压器","authors":"Muyu Li, Yingfeng Wang, Henan Hu, Xudong Zhao","doi":"10.1016/j.inffus.2024.102878","DOIUrl":null,"url":null,"abstract":"Human pose estimation in crowded scenes presents unique challenges due to frequent occlusions and complex interactions between individuals. To address these issues, we introduce InferTrans, a hierarchical structural fusion Transformer designed to improve crowded human pose estimation. InferTrans integrates semantic features into structural information using a hierarchical joint-limb-semantic fusion module. By reorganizing joints and limbs into a tree structure, the fusion module facilitates effective information exchange across different structural levels, and leverage both global structural information and local contextual details. Furthermore, we explicitly model limb structural patterns separately from joints, treating limbs as vectors with defined lengths and orientations. This allows our model to infer complete human poses from minimal input, significantly enhancing pose refinement tasks. Extensive experiments on multiple datasets demonstrate that InferTrans outperforms existing pose estimation techniques in crowded and occluded scenarios. The proposed InferTrans serves as a robust post-processing technique, and is capable of improving the accuracy and robustness of pose estimation in challenging environments.","PeriodicalId":50367,"journal":{"name":"Information Fusion","volume":"202 1","pages":""},"PeriodicalIF":14.7000,"publicationDate":"2024-12-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"InferTrans: Hierarchical structural fusion transformer for crowded human pose estimation\",\"authors\":\"Muyu Li, Yingfeng Wang, Henan Hu, Xudong Zhao\",\"doi\":\"10.1016/j.inffus.2024.102878\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Human pose estimation in crowded scenes presents unique challenges due to frequent occlusions and complex interactions between individuals. To address these issues, we introduce InferTrans, a hierarchical structural fusion Transformer designed to improve crowded human pose estimation. InferTrans integrates semantic features into structural information using a hierarchical joint-limb-semantic fusion module. By reorganizing joints and limbs into a tree structure, the fusion module facilitates effective information exchange across different structural levels, and leverage both global structural information and local contextual details. Furthermore, we explicitly model limb structural patterns separately from joints, treating limbs as vectors with defined lengths and orientations. This allows our model to infer complete human poses from minimal input, significantly enhancing pose refinement tasks. Extensive experiments on multiple datasets demonstrate that InferTrans outperforms existing pose estimation techniques in crowded and occluded scenarios. The proposed InferTrans serves as a robust post-processing technique, and is capable of improving the accuracy and robustness of pose estimation in challenging environments.\",\"PeriodicalId\":50367,\"journal\":{\"name\":\"Information Fusion\",\"volume\":\"202 1\",\"pages\":\"\"},\"PeriodicalIF\":14.7000,\"publicationDate\":\"2024-12-20\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Information Fusion\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://doi.org/10.1016/j.inffus.2024.102878\",\"RegionNum\":1,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Information Fusion","FirstCategoryId":"94","ListUrlMain":"https://doi.org/10.1016/j.inffus.2024.102878","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}

引用次数: 0

摘要

在拥挤的场景中，由于频繁的遮挡和个体之间复杂的相互作用，人体姿势估计提出了独特的挑战。为了解决这些问题，我们引入了一个分层结构融合变压器，旨在改善拥挤的人体姿态估计。intertrans使用分层的关节-肢体-语义融合模块将语义特征集成到结构信息中。通过将关节和肢体重新组织成树状结构，融合模块促进了不同结构级别之间的有效信息交换，并利用了全局结构信息和局部上下文细节。此外，我们明确地将肢体结构模式与关节分开建模，将肢体作为具有定义长度和方向的向量。这使得我们的模型可以从最小的输入中推断出完整的人体姿势，大大增强了姿势优化任务。在多个数据集上进行的大量实验表明，在拥挤和闭塞的情况下，intertrans优于现有的姿态估计技术。提出的intertrans作为一种鲁棒的后处理技术，能够在具有挑战性的环境中提高姿态估计的准确性和鲁棒性。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

InferTrans: Hierarchical structural fusion transformer for crowded human pose estimation

Human pose estimation in crowded scenes presents unique challenges due to frequent occlusions and complex interactions between individuals. To address these issues, we introduce InferTrans, a hierarchical structural fusion Transformer designed to improve crowded human pose estimation. InferTrans integrates semantic features into structural information using a hierarchical joint-limb-semantic fusion module. By reorganizing joints and limbs into a tree structure, the fusion module facilitates effective information exchange across different structural levels, and leverage both global structural information and local contextual details. Furthermore, we explicitly model limb structural patterns separately from joints, treating limbs as vectors with defined lengths and orientations. This allows our model to infer complete human poses from minimal input, significantly enhancing pose refinement tasks. Extensive experiments on multiple datasets demonstrate that InferTrans outperforms existing pose estimation techniques in crowded and occluded scenarios. The proposed InferTrans serves as a robust post-processing technique, and is capable of improving the accuracy and robustness of pose estimation in challenging environments.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Information Fusion 工程技术-计算机：理论方法

CiteScore

33.20

自引率

4.30%

发文量

161

审稿时长

7.9 months

期刊介绍： Information Fusion serves as a central platform for showcasing advancements in multi-sensor, multi-source, multi-process information fusion, fostering collaboration among diverse disciplines driving its progress. It is the leading outlet for sharing research and development in this field, focusing on architectures, algorithms, and applications. Papers dealing with fundamental theoretical analyses as well as those demonstrating their application to real-world problems will be welcome.