基于视频的人体姿态和形状估计的时空趋势推理

BMVC : proceedings of the British Machine Vision Conference. British Machine Vision Conference Pub Date : 2022-10-07 DOI:10.48550/arXiv.2210.03659

Boyang Zhang, Suping Wu, Hu Cao, Kehua Ma, Pan Li, Lei Lin

{"title":"基于视频的人体姿态和形状估计的时空趋势推理","authors":"Boyang Zhang, Suping Wu, Hu Cao, Kehua Ma, Pan Li, Lei Lin","doi":"10.48550/arXiv.2210.03659","DOIUrl":null,"url":null,"abstract":"In this paper, we present a spatio-temporal tendency reasoning (STR) network for recovering human body pose and shape from videos. Previous approaches have focused on how to extend 3D human datasets and temporal-based learning to promote accuracy and temporal smoothing. Different from them, our STR aims to learn accurate and natural motion sequences in an unconstrained environment through temporal and spatial tendency and to fully excavate the spatio-temporal features of existing video data. To this end, our STR learns the representation of features in the temporal and spatial dimensions respectively, to concentrate on a more robust representation of spatio-temporal features. More specifically, for efficient temporal modeling, we first propose a temporal tendency reasoning (TTR) module. TTR constructs a time-dimensional hierarchical residual connection representation within a video sequence to effectively reason temporal sequences' tendencies and retain effective dissemination of human information. Meanwhile, for enhancing the spatial representation, we design a spatial tendency enhancing (STE) module to further learns to excite spatially time-frequency domain sensitive features in human motion information representations. Finally, we introduce integration strategies to integrate and refine the spatio-temporal feature representations. Extensive experimental findings on large-scale publically available datasets reveal that our STR remains competitive with the state-of-the-art on three datasets. Our code are available at https://github.com/Changboyang/STR.git.","PeriodicalId":72437,"journal":{"name":"BMVC : proceedings of the British Machine Vision Conference. British Machine Vision Conference","volume":"3 1","pages":"719"},"PeriodicalIF":0.0000,"publicationDate":"2022-10-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"Spatio-temporal Tendency Reasoning for Human Body Pose and Shape Estimation from Videos\",\"authors\":\"Boyang Zhang, Suping Wu, Hu Cao, Kehua Ma, Pan Li, Lei Lin\",\"doi\":\"10.48550/arXiv.2210.03659\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"In this paper, we present a spatio-temporal tendency reasoning (STR) network for recovering human body pose and shape from videos. Previous approaches have focused on how to extend 3D human datasets and temporal-based learning to promote accuracy and temporal smoothing. Different from them, our STR aims to learn accurate and natural motion sequences in an unconstrained environment through temporal and spatial tendency and to fully excavate the spatio-temporal features of existing video data. To this end, our STR learns the representation of features in the temporal and spatial dimensions respectively, to concentrate on a more robust representation of spatio-temporal features. More specifically, for efficient temporal modeling, we first propose a temporal tendency reasoning (TTR) module. TTR constructs a time-dimensional hierarchical residual connection representation within a video sequence to effectively reason temporal sequences' tendencies and retain effective dissemination of human information. Meanwhile, for enhancing the spatial representation, we design a spatial tendency enhancing (STE) module to further learns to excite spatially time-frequency domain sensitive features in human motion information representations. Finally, we introduce integration strategies to integrate and refine the spatio-temporal feature representations. Extensive experimental findings on large-scale publically available datasets reveal that our STR remains competitive with the state-of-the-art on three datasets. Our code are available at https://github.com/Changboyang/STR.git.\",\"PeriodicalId\":72437,\"journal\":{\"name\":\"BMVC : proceedings of the British Machine Vision Conference. British Machine Vision Conference\",\"volume\":\"3 1\",\"pages\":\"719\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2022-10-07\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"BMVC : proceedings of the British Machine Vision Conference. British Machine Vision Conference\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.48550/arXiv.2210.03659\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"BMVC : proceedings of the British Machine Vision Conference. British Machine Vision Conference","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.48550/arXiv.2210.03659","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 1

摘要

在本文中，我们提出了一个时空趋势推理(STR)网络，用于从视频中恢复人体姿势和形状。以前的方法主要集中在如何扩展3D人类数据集和基于时间的学习，以提高准确性和时间平滑。与之不同的是，我们的STR旨在通过时空趋势学习无约束环境下准确自然的运动序列，充分挖掘现有视频数据的时空特征。为此，我们的STR分别在时间和空间维度上学习特征的表示，以专注于更健壮的时空特征表示。更具体地说，为了有效的时间建模，我们首先提出了一个时间趋势推理(TTR)模块。TTR在视频序列内构建了一个时间维度的分层残差连接表示，有效地推断了时间序列的趋势，保持了人类信息的有效传播。同时，为了增强空间表征，我们设计了空间趋势增强(STE)模块，进一步学习激发人体运动信息表征中的空间时频敏感特征。最后，引入整合策略对时空特征表示进行整合和细化。在大规模公开数据集上的广泛实验结果表明，我们的STR在三个数据集上仍然具有最先进的竞争力。我们的代码可在https://github.com/Changboyang/STR.git上获得。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Spatio-temporal Tendency Reasoning for Human Body Pose and Shape Estimation from Videos

In this paper, we present a spatio-temporal tendency reasoning (STR) network for recovering human body pose and shape from videos. Previous approaches have focused on how to extend 3D human datasets and temporal-based learning to promote accuracy and temporal smoothing. Different from them, our STR aims to learn accurate and natural motion sequences in an unconstrained environment through temporal and spatial tendency and to fully excavate the spatio-temporal features of existing video data. To this end, our STR learns the representation of features in the temporal and spatial dimensions respectively, to concentrate on a more robust representation of spatio-temporal features. More specifically, for efficient temporal modeling, we first propose a temporal tendency reasoning (TTR) module. TTR constructs a time-dimensional hierarchical residual connection representation within a video sequence to effectively reason temporal sequences' tendencies and retain effective dissemination of human information. Meanwhile, for enhancing the spatial representation, we design a spatial tendency enhancing (STE) module to further learns to excite spatially time-frequency domain sensitive features in human motion information representations. Finally, we introduce integration strategies to integrate and refine the spatio-temporal feature representations. Extensive experimental findings on large-scale publically available datasets reveal that our STR remains competitive with the state-of-the-art on three datasets. Our code are available at https://github.com/Changboyang/STR.git.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

BMVC : proceedings of the British Machine Vision Conference. British Machine Vision Conference

自引率

0.00%

发文量