Shek Wai Chu, Chaoyi Zhang, Yang Song, Weidong (Tom) Cai
{"title":"基于查询细化骨架图神经网络的通道位置自关注人体姿态估计","authors":"Shek Wai Chu, Chaoyi Zhang, Yang Song, Weidong (Tom) Cai","doi":"10.1109/ICIP46576.2022.9897882","DOIUrl":null,"url":null,"abstract":"Human Pose Estimation (HPE) is a long-standing yet challenging task in computer vision. The nature of the problem requires comprehensive global contextual reasoning among joints in different locations. In this work, we explore how to incorporate two popular and effective concepts, self-attention and Graph Neural Network (GNN), to model long-range information in HPE. Three different ways to implement self-attention in 3D feature maps are studied, where the best result is achieved via the channel-position version. Accuracy is further improved by refining the queries via an efficient channel-wise parallel GNN that explicitly models the human joint graphical relationships. We are able to improve prediction accuracy on strong baseline models and achieve state-of-the-art results.","PeriodicalId":387035,"journal":{"name":"2022 IEEE International Conference on Image Processing (ICIP)","volume":"14 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-10-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Channel-Position Self-Attention with Query Refinement Skeleton Graph Neural Network in Human Pose Estimation\",\"authors\":\"Shek Wai Chu, Chaoyi Zhang, Yang Song, Weidong (Tom) Cai\",\"doi\":\"10.1109/ICIP46576.2022.9897882\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Human Pose Estimation (HPE) is a long-standing yet challenging task in computer vision. The nature of the problem requires comprehensive global contextual reasoning among joints in different locations. In this work, we explore how to incorporate two popular and effective concepts, self-attention and Graph Neural Network (GNN), to model long-range information in HPE. Three different ways to implement self-attention in 3D feature maps are studied, where the best result is achieved via the channel-position version. Accuracy is further improved by refining the queries via an efficient channel-wise parallel GNN that explicitly models the human joint graphical relationships. We are able to improve prediction accuracy on strong baseline models and achieve state-of-the-art results.\",\"PeriodicalId\":387035,\"journal\":{\"name\":\"2022 IEEE International Conference on Image Processing (ICIP)\",\"volume\":\"14 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2022-10-16\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2022 IEEE International Conference on Image Processing (ICIP)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICIP46576.2022.9897882\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 IEEE International Conference on Image Processing (ICIP)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICIP46576.2022.9897882","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Channel-Position Self-Attention with Query Refinement Skeleton Graph Neural Network in Human Pose Estimation
Human Pose Estimation (HPE) is a long-standing yet challenging task in computer vision. The nature of the problem requires comprehensive global contextual reasoning among joints in different locations. In this work, we explore how to incorporate two popular and effective concepts, self-attention and Graph Neural Network (GNN), to model long-range information in HPE. Three different ways to implement self-attention in 3D feature maps are studied, where the best result is achieved via the channel-position version. Accuracy is further improved by refining the queries via an efficient channel-wise parallel GNN that explicitly models the human joint graphical relationships. We are able to improve prediction accuracy on strong baseline models and achieve state-of-the-art results.