基于HOPE-Net和Mask R-CNN的自中心视觉手部姿态估计改进

2022 International Conference on Multimedia Analysis and Pattern Recognition (MAPR) Pub Date : 2022-10-01 DOI:10.1109/MAPR56351.2022.9924768

S. Nguyen, Thi-Thu-Hong Le, Hoang-Bach Nguyen, Thanh-Tung Phan, Chi-Thanh Nguyen, Hai Vu

{"title":"基于HOPE-Net和Mask R-CNN的自中心视觉手部姿态估计改进","authors":"S. Nguyen, Thi-Thu-Hong Le, Hoang-Bach Nguyen, Thanh-Tung Phan, Chi-Thanh Nguyen, Hai Vu","doi":"10.1109/MAPR56351.2022.9924768","DOIUrl":null,"url":null,"abstract":"Hand pose estimation is the task of predicting the position and orientation of the hand and fingers relative to some coordinate system. It is an important task or input for applications in robotics, medical or human-computer interaction. In recent years, the success of deep convolutional neural networks and the popularity of low-cost consumer wearable cameras have made hand pose estimation on egocentric images using deep neural networks a hot topic in the computer vision field. This paper proposes a novel deep model for accurate 2D hand pose estimation that combines HOPE-Net, which estimates hand pose, and Mask R-CNN, which provides hand detection and segmentation to localize the hand in the image. First, HOPENet is used to predict the initial 2D hand pose, and the hand features are extracted from an image with a hand in the center, which is cropped from the original image based on Mask RCNN’s output. Then, we combine the initial 2D hand pose and the hand features into a fully connected layer to predict the 2D hand pose correctly. Our experiments show that the proposed model outperforms the original HOPE-Net in 2D hand pose estimation. The proposed method’s mean endpoint error (mEPE) is 48.82 pixels, while the mEPE of the 2D HOPE-Net predictor is 86.30 pixels on the First-Person Hand Action dataset.","PeriodicalId":138642,"journal":{"name":"2022 International Conference on Multimedia Analysis and Pattern Recognition (MAPR)","volume":"16 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Improving the Hand Pose Estimation from Egocentric Vision via HOPE-Net and Mask R-CNN\",\"authors\":\"S. Nguyen, Thi-Thu-Hong Le, Hoang-Bach Nguyen, Thanh-Tung Phan, Chi-Thanh Nguyen, Hai Vu\",\"doi\":\"10.1109/MAPR56351.2022.9924768\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Hand pose estimation is the task of predicting the position and orientation of the hand and fingers relative to some coordinate system. It is an important task or input for applications in robotics, medical or human-computer interaction. In recent years, the success of deep convolutional neural networks and the popularity of low-cost consumer wearable cameras have made hand pose estimation on egocentric images using deep neural networks a hot topic in the computer vision field. This paper proposes a novel deep model for accurate 2D hand pose estimation that combines HOPE-Net, which estimates hand pose, and Mask R-CNN, which provides hand detection and segmentation to localize the hand in the image. First, HOPENet is used to predict the initial 2D hand pose, and the hand features are extracted from an image with a hand in the center, which is cropped from the original image based on Mask RCNN’s output. Then, we combine the initial 2D hand pose and the hand features into a fully connected layer to predict the 2D hand pose correctly. Our experiments show that the proposed model outperforms the original HOPE-Net in 2D hand pose estimation. The proposed method’s mean endpoint error (mEPE) is 48.82 pixels, while the mEPE of the 2D HOPE-Net predictor is 86.30 pixels on the First-Person Hand Action dataset.\",\"PeriodicalId\":138642,\"journal\":{\"name\":\"2022 International Conference on Multimedia Analysis and Pattern Recognition (MAPR)\",\"volume\":\"16 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2022-10-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2022 International Conference on Multimedia Analysis and Pattern Recognition (MAPR)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/MAPR56351.2022.9924768\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 International Conference on Multimedia Analysis and Pattern Recognition (MAPR)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/MAPR56351.2022.9924768","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

摘要

手部姿势估计是预测手和手指相对于某个坐标系的位置和方向的任务。它是机器人、医疗或人机交互应用的重要任务或输入。近年来，深度卷积神经网络的成功和低成本消费类可穿戴相机的普及，使得利用深度神经网络对自我中心图像进行手部姿态估计成为计算机视觉领域的一个热点。本文提出了一种新的二维手部姿态精确估计深度模型，该模型结合了估计手部姿态的HOPE-Net和提供手部检测和分割以定位图像中的手部的Mask - cnn。首先，使用HOPENet预测初始2D手部姿势，并从基于Mask RCNN输出的原始图像中裁剪出以手为中心的图像中提取手部特征。然后，我们将初始2D手姿和手特征结合到一个完全连接的层中，以正确预测2D手姿。实验表明，该模型在手部姿态估计方面优于原始的HOPE-Net模型。该方法的平均端点误差(mEPE)为48.82像素，而2D HOPE-Net预测器在第一人称手部动作数据集上的mEPE为86.30像素。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Improving the Hand Pose Estimation from Egocentric Vision via HOPE-Net and Mask R-CNN

Hand pose estimation is the task of predicting the position and orientation of the hand and fingers relative to some coordinate system. It is an important task or input for applications in robotics, medical or human-computer interaction. In recent years, the success of deep convolutional neural networks and the popularity of low-cost consumer wearable cameras have made hand pose estimation on egocentric images using deep neural networks a hot topic in the computer vision field. This paper proposes a novel deep model for accurate 2D hand pose estimation that combines HOPE-Net, which estimates hand pose, and Mask R-CNN, which provides hand detection and segmentation to localize the hand in the image. First, HOPENet is used to predict the initial 2D hand pose, and the hand features are extracted from an image with a hand in the center, which is cropped from the original image based on Mask RCNN’s output. Then, we combine the initial 2D hand pose and the hand features into a fully connected layer to predict the 2D hand pose correctly. Our experiments show that the proposed model outperforms the original HOPE-Net in 2D hand pose estimation. The proposed method’s mean endpoint error (mEPE) is 48.82 pixels, while the mEPE of the 2D HOPE-Net predictor is 86.30 pixels on the First-Person Hand Action dataset.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2022 International Conference on Multimedia Analysis and Pattern Recognition (MAPR)

自引率

0.00%

发文量