Daxing Chen, Xinghao Song, Shixi Fan, Hongpeng Wang
{"title":"一种多人姿态估计的注意模块","authors":"Daxing Chen, Xinghao Song, Shixi Fan, Hongpeng Wang","doi":"10.1109/ROBIO49542.2019.8961623","DOIUrl":null,"url":null,"abstract":"In the top-down approaches of multi-person pose estimation, a human detector is adopted first to generate a set of human bounding boxes, then crop these human body and perform a single-person pose estimation model to get the final result. However, some body part of another person on the cropped image will interfere the single-person pose estimation model leading to an inaccuracy result. In order to model the relationship between adjacent keypoints effectively to alleviate this problem, we propose and attention module that could let the model get global receptive field at the shallow layer of the network and pay more attention to the key areas which is more important to pose estimation. Experiment results show that our method achieves 73.9% mAP with 2.4% absolute improvement compared to our baseline on the COCO test-dev dataset.","PeriodicalId":121822,"journal":{"name":"2019 IEEE International Conference on Robotics and Biomimetics (ROBIO)","volume":"1924 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"An Attention Module for Multi-Person Pose Estimation\",\"authors\":\"Daxing Chen, Xinghao Song, Shixi Fan, Hongpeng Wang\",\"doi\":\"10.1109/ROBIO49542.2019.8961623\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"In the top-down approaches of multi-person pose estimation, a human detector is adopted first to generate a set of human bounding boxes, then crop these human body and perform a single-person pose estimation model to get the final result. However, some body part of another person on the cropped image will interfere the single-person pose estimation model leading to an inaccuracy result. In order to model the relationship between adjacent keypoints effectively to alleviate this problem, we propose and attention module that could let the model get global receptive field at the shallow layer of the network and pay more attention to the key areas which is more important to pose estimation. Experiment results show that our method achieves 73.9% mAP with 2.4% absolute improvement compared to our baseline on the COCO test-dev dataset.\",\"PeriodicalId\":121822,\"journal\":{\"name\":\"2019 IEEE International Conference on Robotics and Biomimetics (ROBIO)\",\"volume\":\"1924 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2019-12-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2019 IEEE International Conference on Robotics and Biomimetics (ROBIO)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ROBIO49542.2019.8961623\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2019 IEEE International Conference on Robotics and Biomimetics (ROBIO)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ROBIO49542.2019.8961623","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
An Attention Module for Multi-Person Pose Estimation
In the top-down approaches of multi-person pose estimation, a human detector is adopted first to generate a set of human bounding boxes, then crop these human body and perform a single-person pose estimation model to get the final result. However, some body part of another person on the cropped image will interfere the single-person pose estimation model leading to an inaccuracy result. In order to model the relationship between adjacent keypoints effectively to alleviate this problem, we propose and attention module that could let the model get global receptive field at the shallow layer of the network and pay more attention to the key areas which is more important to pose estimation. Experiment results show that our method achieves 73.9% mAP with 2.4% absolute improvement compared to our baseline on the COCO test-dev dataset.