Revising Representation and Target Deviations for Accurate Human Pose Estimation.

IF 10.2 1区计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

IEEE transactions on neural networks and learning systems Pub Date : 2025-05-22 DOI:10.1109/tnnls.2025.3569464

Zian Zhang,Yongqiang Zhang,Yancheng Bai,Man Zhang,Rui Tian,Yin Zhang,Mingli Ding,Wangmeng Zuo

{"title":"Revising Representation and Target Deviations for Accurate Human Pose Estimation.","authors":"Zian Zhang,Yongqiang Zhang,Yancheng Bai,Man Zhang,Rui Tian,Yin Zhang,Mingli Ding,Wangmeng Zuo","doi":"10.1109/tnnls.2025.3569464","DOIUrl":null,"url":null,"abstract":"Owing to the normalized instance scales and robust supervision, heatmap-based human pose estimation (HPE) methods with top-down paradigm have achieved a dominant performance. However, there are two inherent deviations in the basic framework, i.e., representation and target deviations, resulting in performance bottlenecks. The representation deviation is caused by transforming various scales of instances into a unified input size, which results in performance degradation because data with different scale-related characteristics can hardly be handled via unified parameters. The target deviation is caused by exploiting a prior distribution (e.g., Gauss) to model the prediction error, which hinders sufficient network training. In this article, we propose a novel framework called DRPose to revise the abovementioned deviations. Specifically, to address the representation deviation, a scale-aware domain bridging (SDB) block is proposed to transfer feature maps from multiple scale-dependent domains into a unified intermediate domain with dynamic parameters. To address the target deviation, a differentiable coordinate decoder (DCD) is presented to adaptively adjust target distribution of heatmaps in an end-to-end manner. Extensive experiments show that the proposed method significantly improves the performance of most existing models with negligible additional cost. Beyond this, our method achieves 77.1% AP on the COCO test-dev set, outperforming prior works with similar model complexity.","PeriodicalId":13303,"journal":{"name":"IEEE transactions on neural networks and learning systems","volume":"4 1","pages":""},"PeriodicalIF":10.2000,"publicationDate":"2025-05-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE transactions on neural networks and learning systems","FirstCategoryId":"94","ListUrlMain":"https://doi.org/10.1109/tnnls.2025.3569464","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}

引用次数: 0

Abstract

Owing to the normalized instance scales and robust supervision, heatmap-based human pose estimation (HPE) methods with top-down paradigm have achieved a dominant performance. However, there are two inherent deviations in the basic framework, i.e., representation and target deviations, resulting in performance bottlenecks. The representation deviation is caused by transforming various scales of instances into a unified input size, which results in performance degradation because data with different scale-related characteristics can hardly be handled via unified parameters. The target deviation is caused by exploiting a prior distribution (e.g., Gauss) to model the prediction error, which hinders sufficient network training. In this article, we propose a novel framework called DRPose to revise the abovementioned deviations. Specifically, to address the representation deviation, a scale-aware domain bridging (SDB) block is proposed to transfer feature maps from multiple scale-dependent domains into a unified intermediate domain with dynamic parameters. To address the target deviation, a differentiable coordinate decoder (DCD) is presented to adaptively adjust target distribution of heatmaps in an end-to-end manner. Extensive experiments show that the proposed method significantly improves the performance of most existing models with negligible additional cost. Beyond this, our method achieves 77.1% AP on the COCO test-dev set, outperforming prior works with similar model complexity.

查看原文本刊更多论文

修正人体姿态估计的表示和目标偏差。

由于归一化的实例尺度和鲁棒性监督，基于热图的自顶向下范式人体姿态估计（HPE）方法取得了优势。然而，在基本框架中存在两种固有的偏差，即表示和目标偏差，从而导致性能瓶颈。由于将不同规模的实例转换为统一的输入大小，导致了表示偏差，难以通过统一的参数处理具有不同规模相关特征的数据，从而导致性能下降。目标偏差是通过利用先验分布（例如高斯）来建模预测误差引起的，这阻碍了充分的网络训练。在本文中，我们提出了一个新的框架，称为DRPose来修正上述偏差。具体来说，为了解决表示偏差问题，提出了一个尺度感知域桥接（SDB）块，将多个尺度依赖域的特征映射转移到一个具有动态参数的统一中间域。为了解决目标偏差问题，提出了一种可微坐标解码器（DCD），以端到端方式自适应调整热图的目标分布。大量的实验表明，该方法可以显著提高大多数现有模型的性能，而额外的成本可以忽略不计。除此之外，我们的方法在COCO测试开发集上实现了77.1%的AP，优于具有相似模型复杂性的先前工作。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

IEEE transactions on neural networks and learning systems COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE-COMPUTER SCIENCE, HARDWARE & ARCHITECTURE

CiteScore

23.80

自引率

9.60%

发文量

2102

审稿时长

3-8 weeks

期刊介绍： The focus of IEEE Transactions on Neural Networks and Learning Systems is to present scholarly articles discussing the theory, design, and applications of neural networks as well as other learning systems. The journal primarily highlights technical and scientific research in this domain.