{"title":"FMR-GNet:用于三维姿态估计的前向混合跳转时空残差图网络","authors":"Honghong Yang;Hongxi Liu;Yumei Zhang;Xiaojun Wu","doi":"10.23919/cje.2022.00.365","DOIUrl":null,"url":null,"abstract":"Graph convolutional networks that leverage spatial-temporal information from skeletal data have emerged as a popular approach for 3D human pose estimation. However, comprehensively modeling consistent spatial-temporal dependencies among the body joints remains a challenging task. Current approaches are limited by performing graph convolutions solely on immediate neighbors, deploying separate spatial or temporal modules, and utilizing single-pass feedforward architectures. To solve these limitations, we propose a forward multi-scale residual graph convolutional network (FMR-GNet) for 3D pose estimation from monocular video. First, we introduce a mix-hop spatial-temporal attention graph convolution layer that effectively aggregates neighboring features with learnable weights over large receptive fields. The attention mechanism enables dynamically computing edge weights at each layer. Second, we devise a cross-domain spatial-temporal residual module to fuse multi-scale spatial-temporal convolutional features through residual connections, explicitly modeling interdependencies across spatial and temporal domains. Third, we integrate a forward dense connection block to propagate spatial-temporal representations across network layers, enabling high-level semantic skeleton information to enrich lower-level features. Comprehensive experiments conducted on two challenging 3D human pose estimation benchmarks, namely Human3.6M and MPI-INF-3DHP, demonstrate that the proposed FMR-GNet achieves superior performance, surpassing the most state-of-the-art methods.","PeriodicalId":50701,"journal":{"name":"Chinese Journal of Electronics","volume":"33 6","pages":"1346-1359"},"PeriodicalIF":1.6000,"publicationDate":"2024-11-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10748551","citationCount":"0","resultStr":"{\"title\":\"FMR-GNet: Forward Mix-Hop Spatial-Temporal Residual Graph Network for 3D Pose Estimation\",\"authors\":\"Honghong Yang;Hongxi Liu;Yumei Zhang;Xiaojun Wu\",\"doi\":\"10.23919/cje.2022.00.365\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Graph convolutional networks that leverage spatial-temporal information from skeletal data have emerged as a popular approach for 3D human pose estimation. However, comprehensively modeling consistent spatial-temporal dependencies among the body joints remains a challenging task. Current approaches are limited by performing graph convolutions solely on immediate neighbors, deploying separate spatial or temporal modules, and utilizing single-pass feedforward architectures. To solve these limitations, we propose a forward multi-scale residual graph convolutional network (FMR-GNet) for 3D pose estimation from monocular video. First, we introduce a mix-hop spatial-temporal attention graph convolution layer that effectively aggregates neighboring features with learnable weights over large receptive fields. The attention mechanism enables dynamically computing edge weights at each layer. Second, we devise a cross-domain spatial-temporal residual module to fuse multi-scale spatial-temporal convolutional features through residual connections, explicitly modeling interdependencies across spatial and temporal domains. Third, we integrate a forward dense connection block to propagate spatial-temporal representations across network layers, enabling high-level semantic skeleton information to enrich lower-level features. Comprehensive experiments conducted on two challenging 3D human pose estimation benchmarks, namely Human3.6M and MPI-INF-3DHP, demonstrate that the proposed FMR-GNet achieves superior performance, surpassing the most state-of-the-art methods.\",\"PeriodicalId\":50701,\"journal\":{\"name\":\"Chinese Journal of Electronics\",\"volume\":\"33 6\",\"pages\":\"1346-1359\"},\"PeriodicalIF\":1.6000,\"publicationDate\":\"2024-11-11\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10748551\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Chinese Journal of Electronics\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://ieeexplore.ieee.org/document/10748551/\",\"RegionNum\":4,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q3\",\"JCRName\":\"ENGINEERING, ELECTRICAL & ELECTRONIC\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Chinese Journal of Electronics","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/10748551/","RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"ENGINEERING, ELECTRICAL & ELECTRONIC","Score":null,"Total":0}
FMR-GNet: Forward Mix-Hop Spatial-Temporal Residual Graph Network for 3D Pose Estimation
Graph convolutional networks that leverage spatial-temporal information from skeletal data have emerged as a popular approach for 3D human pose estimation. However, comprehensively modeling consistent spatial-temporal dependencies among the body joints remains a challenging task. Current approaches are limited by performing graph convolutions solely on immediate neighbors, deploying separate spatial or temporal modules, and utilizing single-pass feedforward architectures. To solve these limitations, we propose a forward multi-scale residual graph convolutional network (FMR-GNet) for 3D pose estimation from monocular video. First, we introduce a mix-hop spatial-temporal attention graph convolution layer that effectively aggregates neighboring features with learnable weights over large receptive fields. The attention mechanism enables dynamically computing edge weights at each layer. Second, we devise a cross-domain spatial-temporal residual module to fuse multi-scale spatial-temporal convolutional features through residual connections, explicitly modeling interdependencies across spatial and temporal domains. Third, we integrate a forward dense connection block to propagate spatial-temporal representations across network layers, enabling high-level semantic skeleton information to enrich lower-level features. Comprehensive experiments conducted on two challenging 3D human pose estimation benchmarks, namely Human3.6M and MPI-INF-3DHP, demonstrate that the proposed FMR-GNet achieves superior performance, surpassing the most state-of-the-art methods.
期刊介绍:
CJE focuses on the emerging fields of electronics, publishing innovative and transformative research papers. Most of the papers published in CJE are from universities and research institutes, presenting their innovative research results. Both theoretical and practical contributions are encouraged, and original research papers reporting novel solutions to the hot topics in electronics are strongly recommended.