{"title":"基于反卷积自底向上深度网络的多人姿态估计","authors":"Meng Li, Haoqian Wang, Yongbing Zhang, Yi Yang","doi":"10.1109/IST48021.2019.9010189","DOIUrl":null,"url":null,"abstract":"Due to the trade off between model complexity and estimation accuracy, current human pose estimators either are of low accuracy or requires long running time. Such dilemma is especially severe in real time multi-person pose estimation. To address this issue, we design a deep network of reduced parameter size and high estimation accuracy, via introducing deconvolution layers instead of widely used fully-connected configuration. Specifically, our model consists of two successive parts: Detection network and matching network. The former outputs keypoint heatmap and person locations, and then the latter produces the final pose estimation using multiple deconvolutional layers. Benefiting from the simple structure and explicit utilization of previously neglected spatial structure in heatmap, the matching network is of specially high efficiency and at high precision. Experiments on the challenging COCO dataset demonstrate our method can almost cut off the running parameters of matching network, while achieving higher accuracy than existing methods.","PeriodicalId":117219,"journal":{"name":"2019 IEEE International Conference on Imaging Systems and Techniques (IST)","volume":"206 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"A Deconvolutional Bottom-up Deep Network for multi-person pose estimation\",\"authors\":\"Meng Li, Haoqian Wang, Yongbing Zhang, Yi Yang\",\"doi\":\"10.1109/IST48021.2019.9010189\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Due to the trade off between model complexity and estimation accuracy, current human pose estimators either are of low accuracy or requires long running time. Such dilemma is especially severe in real time multi-person pose estimation. To address this issue, we design a deep network of reduced parameter size and high estimation accuracy, via introducing deconvolution layers instead of widely used fully-connected configuration. Specifically, our model consists of two successive parts: Detection network and matching network. The former outputs keypoint heatmap and person locations, and then the latter produces the final pose estimation using multiple deconvolutional layers. Benefiting from the simple structure and explicit utilization of previously neglected spatial structure in heatmap, the matching network is of specially high efficiency and at high precision. Experiments on the challenging COCO dataset demonstrate our method can almost cut off the running parameters of matching network, while achieving higher accuracy than existing methods.\",\"PeriodicalId\":117219,\"journal\":{\"name\":\"2019 IEEE International Conference on Imaging Systems and Techniques (IST)\",\"volume\":\"206 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2019-12-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2019 IEEE International Conference on Imaging Systems and Techniques (IST)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/IST48021.2019.9010189\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2019 IEEE International Conference on Imaging Systems and Techniques (IST)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/IST48021.2019.9010189","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
A Deconvolutional Bottom-up Deep Network for multi-person pose estimation
Due to the trade off between model complexity and estimation accuracy, current human pose estimators either are of low accuracy or requires long running time. Such dilemma is especially severe in real time multi-person pose estimation. To address this issue, we design a deep network of reduced parameter size and high estimation accuracy, via introducing deconvolution layers instead of widely used fully-connected configuration. Specifically, our model consists of two successive parts: Detection network and matching network. The former outputs keypoint heatmap and person locations, and then the latter produces the final pose estimation using multiple deconvolutional layers. Benefiting from the simple structure and explicit utilization of previously neglected spatial structure in heatmap, the matching network is of specially high efficiency and at high precision. Experiments on the challenging COCO dataset demonstrate our method can almost cut off the running parameters of matching network, while achieving higher accuracy than existing methods.