{"title":"3D Human Pose Estimation: Using Context Information in Monocular Video","authors":"Yuan-yuan Zhou, Xiaoyan Hu","doi":"10.1109/ICCEAI52939.2021.00001","DOIUrl":null,"url":null,"abstract":"We propose a context-based two-stage 3D human pose estimation network structure. The first stage is to obtain the 2D human pose and 2D key-points in the video stream data, this stage is crucial to the subsequent work and the entire process. By analyzing the limitations and shortcomings of existing models, we proposed a context-based human pose estimation network structure, and incorporate the BILSTM structure into the pose machine method. In our model, Invisible key-points can be jointly predicted by human pose in current frame and context information. Through quantification and visualization experiments, we have proved that it has a good mitigating effect on the invisible key points caused by occlusion and the wrong linking of human key-points. In the second stage, the 3D human pose is obtained through sparse representation and 3D reconstruction. The experimental results show that the method we designed has higher accuracy than the existing human body pose estimation method of video streaming, and has better performance in the occlusion problem.","PeriodicalId":331409,"journal":{"name":"2021 International Conference on Computer Engineering and Artificial Intelligence (ICCEAI)","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2021-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 International Conference on Computer Engineering and Artificial Intelligence (ICCEAI)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICCEAI52939.2021.00001","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1
Abstract
We propose a context-based two-stage 3D human pose estimation network structure. The first stage is to obtain the 2D human pose and 2D key-points in the video stream data, this stage is crucial to the subsequent work and the entire process. By analyzing the limitations and shortcomings of existing models, we proposed a context-based human pose estimation network structure, and incorporate the BILSTM structure into the pose machine method. In our model, Invisible key-points can be jointly predicted by human pose in current frame and context information. Through quantification and visualization experiments, we have proved that it has a good mitigating effect on the invisible key points caused by occlusion and the wrong linking of human key-points. In the second stage, the 3D human pose is obtained through sparse representation and 3D reconstruction. The experimental results show that the method we designed has higher accuracy than the existing human body pose estimation method of video streaming, and has better performance in the occlusion problem.