{"title":"MITPose:用于人体姿态估计的多粒度特征交互","authors":"Jiayu Zou, Jie Qin, Zhen Zhang, Xingang Wang","doi":"10.1109/ICIVC55077.2022.9887304","DOIUrl":null,"url":null,"abstract":"Human pose estimation is broadly used in action recognition, Re-Identity, and multi-object tracking. Recently deep convolutional neural networks have demonstrated their great power in human pose estimation. However, CNN-based methods are limited by the constrained receptive field that has poor performance in modeling global relationships of different body parts. In this paper, we propose a novel multi-granularity feature interaction network for human pose estimation (MITPose), which exploits the multi-granularity feature interaction in global-local level features, multi-scale features, and locality features. Our MITPose can efficiently leverage the long-range representation ability of transformer net and inductive locality of convolution net to obtain the comprehensive information for key point localization and relationship modeling. Extensive experiments illustrate that our proposed MITPose achieves state-of-the-art performance on the public COCO dataset.","PeriodicalId":227073,"journal":{"name":"2022 7th International Conference on Image, Vision and Computing (ICIVC)","volume":"81 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-07-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"MITPose: Multi-Granularity Feature Interaction for Human Pose Estimation\",\"authors\":\"Jiayu Zou, Jie Qin, Zhen Zhang, Xingang Wang\",\"doi\":\"10.1109/ICIVC55077.2022.9887304\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Human pose estimation is broadly used in action recognition, Re-Identity, and multi-object tracking. Recently deep convolutional neural networks have demonstrated their great power in human pose estimation. However, CNN-based methods are limited by the constrained receptive field that has poor performance in modeling global relationships of different body parts. In this paper, we propose a novel multi-granularity feature interaction network for human pose estimation (MITPose), which exploits the multi-granularity feature interaction in global-local level features, multi-scale features, and locality features. Our MITPose can efficiently leverage the long-range representation ability of transformer net and inductive locality of convolution net to obtain the comprehensive information for key point localization and relationship modeling. Extensive experiments illustrate that our proposed MITPose achieves state-of-the-art performance on the public COCO dataset.\",\"PeriodicalId\":227073,\"journal\":{\"name\":\"2022 7th International Conference on Image, Vision and Computing (ICIVC)\",\"volume\":\"81 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2022-07-26\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2022 7th International Conference on Image, Vision and Computing (ICIVC)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICIVC55077.2022.9887304\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 7th International Conference on Image, Vision and Computing (ICIVC)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICIVC55077.2022.9887304","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
MITPose: Multi-Granularity Feature Interaction for Human Pose Estimation
Human pose estimation is broadly used in action recognition, Re-Identity, and multi-object tracking. Recently deep convolutional neural networks have demonstrated their great power in human pose estimation. However, CNN-based methods are limited by the constrained receptive field that has poor performance in modeling global relationships of different body parts. In this paper, we propose a novel multi-granularity feature interaction network for human pose estimation (MITPose), which exploits the multi-granularity feature interaction in global-local level features, multi-scale features, and locality features. Our MITPose can efficiently leverage the long-range representation ability of transformer net and inductive locality of convolution net to obtain the comprehensive information for key point localization and relationship modeling. Extensive experiments illustrate that our proposed MITPose achieves state-of-the-art performance on the public COCO dataset.