Sequence-to-Sequence Learning for Human Pose Correction in Videos

2017 4th IAPR Asian Conference on Pattern Recognition (ACPR) Pub Date : 2017-11-01 DOI:10.1109/ACPR.2017.126

S. Swetha, V. Balasubramanian, C. V. Jawahar

引用次数: 2

Abstract

The power of ConvNets has been demonstrated in a wide variety of vision tasks including pose estimation. But they often produce absurdly erroneous predictions in videos due to unusual poses, challenging illumination, blur, self-occlusions etc. These erroneous predictions can be refined by leveraging previous and future predictions as the temporal smoothness constrain in the videos. In this paper, we present a generic approach for pose correction in videos using sequence learning that makes minimal assumptions on the sequence structure. The proposed model is generic, fast and surpasses the state-of-the-art on benchmark datasets. We use a generic pose estimator for initial pose estimates, which are further refined using our method. The proposed architecture uses Long Short-Term Memory (LSTM) encoder-decoder model to encode the temporal context and refine the estimations. We show 3.7% gain over the baseline Yang & Ramanan (YR) and 2.07% gain over Spatial Fusion Network (SFN) on a new challenging YouTube Pose Subset dataset.

查看原文本刊更多论文

视频中人体姿势校正的序列到序列学习

卷积神经网络的强大功能已经在包括姿态估计在内的各种视觉任务中得到了证明。但由于不寻常的姿势，具有挑战性的照明，模糊，自我遮挡等，它们经常在视频中产生荒谬的错误预测。这些错误的预测可以通过利用之前和未来的预测作为视频中的时间平滑约束来改进。在本文中，我们提出了一种使用序列学习的视频姿态校正的通用方法，该方法对序列结构进行了最小的假设。所提出的模型是通用的，快速的，并且在基准数据集上超越了最先进的。我们使用通用姿态估计器进行初始姿态估计，并使用我们的方法进一步改进。所提出的体系结构使用长短期记忆(LSTM)编码器-解码器模型对时间上下文进行编码并改进估计。在一个新的具有挑战性的YouTube姿势子集数据集上，我们显示了比基线Yang和Ramanan (YR)增加3.7%，比空间融合网络(SFN)增加2.07%。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2017 4th IAPR Asian Conference on Pattern Recognition (ACPR)

自引率

0.00%

发文量