基于凸组合的两阶段表示改进三维人体姿态估计

IEEE transactions on artificial intelligence Pub Date : 2024-07-22 DOI:10.1109/TAI.2024.3432028

Luefeng Chen;Wei Cao;Biao Zheng;Min Wu;Witold Pedrycz;Kaoru Hirota

{"title":"基于凸组合的两阶段表示改进三维人体姿态估计","authors":"Luefeng Chen;Wei Cao;Biao Zheng;Min Wu;Witold Pedrycz;Kaoru Hirota","doi":"10.1109/TAI.2024.3432028","DOIUrl":null,"url":null,"abstract":"In the human pose estimation task, on the one hand, 3-D pose always has difficulty in dividing different 2-D poses if the view is limited; on the other hand, it is hard to reduce the lifting ambiguity because of the lack of depth information, it is an important and challenging problem. Therefore, two-stage representation refinement based on the convex combination for 3-D human pose estimation is proposed, in which the two-stage method includes a dense-spatial-temporal convolutional network and a local-to-refine network. The former is applied to determine the features between each video frame; the latter is used to get the different scales of pose details. It aims to address the difficulty of estimating 3-D human pose from 2-D image sequences. In such a way, it can better use the relations between every frame in the sequence of the pose video to produce more accurate results. Finally, we combine the above network with a block called convex combination to help refine the 3-D pose location. We test the proposed approach on both Human3.6m and MPII datasets. The result confirms that our method can achieve better performance than improved CNN supervision, a simple yet effective baseline, and coarse-to-fine volumetric prediction. Besides, a robustness test experiment is carried out for the proposed method while the input is interrupted. The result verifies that our method shows better robustness.","PeriodicalId":73305,"journal":{"name":"IEEE transactions on artificial intelligence","volume":"5 12","pages":"6500-6508"},"PeriodicalIF":0.0000,"publicationDate":"2024-07-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Two-Stage Representation Refinement Based on Convex Combination for 3-D Human Poses Estimation\",\"authors\":\"Luefeng Chen;Wei Cao;Biao Zheng;Min Wu;Witold Pedrycz;Kaoru Hirota\",\"doi\":\"10.1109/TAI.2024.3432028\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"In the human pose estimation task, on the one hand, 3-D pose always has difficulty in dividing different 2-D poses if the view is limited; on the other hand, it is hard to reduce the lifting ambiguity because of the lack of depth information, it is an important and challenging problem. Therefore, two-stage representation refinement based on the convex combination for 3-D human pose estimation is proposed, in which the two-stage method includes a dense-spatial-temporal convolutional network and a local-to-refine network. The former is applied to determine the features between each video frame; the latter is used to get the different scales of pose details. It aims to address the difficulty of estimating 3-D human pose from 2-D image sequences. In such a way, it can better use the relations between every frame in the sequence of the pose video to produce more accurate results. Finally, we combine the above network with a block called convex combination to help refine the 3-D pose location. We test the proposed approach on both Human3.6m and MPII datasets. The result confirms that our method can achieve better performance than improved CNN supervision, a simple yet effective baseline, and coarse-to-fine volumetric prediction. Besides, a robustness test experiment is carried out for the proposed method while the input is interrupted. The result verifies that our method shows better robustness.\",\"PeriodicalId\":73305,\"journal\":{\"name\":\"IEEE transactions on artificial intelligence\",\"volume\":\"5 12\",\"pages\":\"6500-6508\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2024-07-22\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"IEEE transactions on artificial intelligence\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://ieeexplore.ieee.org/document/10606307/\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE transactions on artificial intelligence","FirstCategoryId":"1085","ListUrlMain":"https://ieeexplore.ieee.org/document/10606307/","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

摘要

在人体姿态估计任务中，一方面，在视野有限的情况下，三维姿态难以分割不同的二维姿态；另一方面，由于缺乏深度信息，难以降低提升模糊度，这是一个重要而具有挑战性的问题。为此，提出了一种基于凸组合的两阶段人体姿态估计表示改进方法，其中两阶段方法包括一个密集时空卷积网络和一个局部-细化网络。前者用于确定每个视频帧之间的特征；后者用于获取姿态细节的不同尺度。它旨在解决从二维图像序列中估计三维人体姿态的困难。这样可以更好地利用姿态视频序列中每一帧之间的关系，产生更准确的结果。最后，我们将上述网络与一个称为凸组合的块结合起来，以帮助改进三维姿态位置。我们在Human3.6m和MPII数据集上测试了所提出的方法。结果证实，我们的方法可以获得比改进的CNN监督、简单有效的基线和粗到细的体积预测更好的性能。此外，还对该方法进行了输入中断情况下的鲁棒性检验实验。结果表明，该方法具有较好的鲁棒性。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Two-Stage Representation Refinement Based on Convex Combination for 3-D Human Poses Estimation

In the human pose estimation task, on the one hand, 3-D pose always has difficulty in dividing different 2-D poses if the view is limited; on the other hand, it is hard to reduce the lifting ambiguity because of the lack of depth information, it is an important and challenging problem. Therefore, two-stage representation refinement based on the convex combination for 3-D human pose estimation is proposed, in which the two-stage method includes a dense-spatial-temporal convolutional network and a local-to-refine network. The former is applied to determine the features between each video frame; the latter is used to get the different scales of pose details. It aims to address the difficulty of estimating 3-D human pose from 2-D image sequences. In such a way, it can better use the relations between every frame in the sequence of the pose video to produce more accurate results. Finally, we combine the above network with a block called convex combination to help refine the 3-D pose location. We test the proposed approach on both Human3.6m and MPII datasets. The result confirms that our method can achieve better performance than improved CNN supervision, a simple yet effective baseline, and coarse-to-fine volumetric prediction. Besides, a robustness test experiment is carried out for the proposed method while the input is interrupted. The result verifies that our method shows better robustness.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

IEEE transactions on artificial intelligence

CiteScore

7.70

自引率

0.00%

发文量