E2Pose:端到端多人姿态估计的全卷积网络

2022 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) Pub Date : 2022-10-23 DOI:10.1109/IROS47612.2022.9981322

Masakazu Tobeta, Y. Sawada, Ze Zheng, Sawa Takamuku, N. Natori

{"title":"E2Pose:端到端多人姿态估计的全卷积网络","authors":"Masakazu Tobeta, Y. Sawada, Ze Zheng, Sawa Takamuku, N. Natori","doi":"10.1109/IROS47612.2022.9981322","DOIUrl":null,"url":null,"abstract":"Highly accurate multi-person pose estimation at a high framerate is a fundamental problem in autonomous driving. Solving the problem could aid in preventing pedestrian-car accidents. The present study tackles this problem by proposing a new model composed of a feature pyramid and an original head to a general backbone. The original head is built using lightweight CNNs and directly estimates multi-person pose coordinates. This configuration avoids the complex post-processing and two-stage estimation adopted by other models and allows for a lightweight model. Our model can be trained end-to-end and performed in real-time on a resource-limited platform (low-cost edge device) during inference. Experimental results using the COCO and CrowdPose datasets showed that our model can achieve a higher framerate (approx. 20 frames/sec with NVIDIA Jetson AGX Xavier) than other state-of-the-art models while maintaining sufficient accuracy for practical use.","PeriodicalId":431373,"journal":{"name":"2022 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS)","volume":"69 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-10-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":"{\"title\":\"E2Pose: Fully Convolutional Networks for End-to-End Multi-Person Pose Estimation\",\"authors\":\"Masakazu Tobeta, Y. Sawada, Ze Zheng, Sawa Takamuku, N. Natori\",\"doi\":\"10.1109/IROS47612.2022.9981322\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Highly accurate multi-person pose estimation at a high framerate is a fundamental problem in autonomous driving. Solving the problem could aid in preventing pedestrian-car accidents. The present study tackles this problem by proposing a new model composed of a feature pyramid and an original head to a general backbone. The original head is built using lightweight CNNs and directly estimates multi-person pose coordinates. This configuration avoids the complex post-processing and two-stage estimation adopted by other models and allows for a lightweight model. Our model can be trained end-to-end and performed in real-time on a resource-limited platform (low-cost edge device) during inference. Experimental results using the COCO and CrowdPose datasets showed that our model can achieve a higher framerate (approx. 20 frames/sec with NVIDIA Jetson AGX Xavier) than other state-of-the-art models while maintaining sufficient accuracy for practical use.\",\"PeriodicalId\":431373,\"journal\":{\"name\":\"2022 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS)\",\"volume\":\"69 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2022-10-23\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"2\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2022 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/IROS47612.2022.9981322\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/IROS47612.2022.9981322","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 2

摘要

高帧率下的高精度多人姿态估计是自动驾驶中的一个基本问题。解决这个问题有助于防止行人与汽车之间的事故。本研究通过提出一个由特征金字塔和原始头部到一般脊柱组成的新模型来解决这个问题。原始头部使用轻量级cnn构建，并直接估计多人姿态坐标。这种配置避免了其他模型采用的复杂的后处理和两阶段估计，并允许轻量级模型。我们的模型可以在推理期间在资源有限的平台(低成本边缘设备)上进行端到端训练和实时执行。使用COCO和CrowdPose数据集的实验结果表明，我们的模型可以实现更高的帧率(约为1 / 3)。20帧/秒与NVIDIA Jetson AGX Xavier)比其他国家的最先进的模型，同时保持足够的精度用于实际使用。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

E2Pose: Fully Convolutional Networks for End-to-End Multi-Person Pose Estimation

Highly accurate multi-person pose estimation at a high framerate is a fundamental problem in autonomous driving. Solving the problem could aid in preventing pedestrian-car accidents. The present study tackles this problem by proposing a new model composed of a feature pyramid and an original head to a general backbone. The original head is built using lightweight CNNs and directly estimates multi-person pose coordinates. This configuration avoids the complex post-processing and two-stage estimation adopted by other models and allows for a lightweight model. Our model can be trained end-to-end and performed in real-time on a resource-limited platform (low-cost edge device) during inference. Experimental results using the COCO and CrowdPose datasets showed that our model can achieve a higher framerate (approx. 20 frames/sec with NVIDIA Jetson AGX Xavier) than other state-of-the-art models while maintaining sufficient accuracy for practical use.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2022 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS)

自引率

0.00%

发文量