Gesture-world environment technology for mobile manipulation

K. Hoshino, Takuya Kasahara, Naoki Igo, Motomasa Tomida, T. Mukai, Kinji Nishi, Hajime Kotani
{"title":"Gesture-world environment technology for mobile manipulation","authors":"K. Hoshino, Takuya Kasahara, Naoki Igo, Motomasa Tomida, T. Mukai, Kinji Nishi, Hajime Kotani","doi":"10.1109/SII.2010.5708323","DOIUrl":null,"url":null,"abstract":"The aim of this paper is to propose the technology to allow people to control robots by means of everyday gestures without using sensors or controllers. The hand pose estimation we propose reduces the number of image features per data set to 64, which makes the construction of a large-scale database possible. This has also made it possible to estimate the 3D hand poses of unspecified users with individual differences without sacrificing estimation accuracy. Specifically, the system we propose involved the construction in advance of a large database comprising three elements: hand joint information including the wrist, low-order proportional information on the hand images to indicate the rough hand shape, and hand pose data comprised of 64 image features per data set. To estimate a hand pose, the system first performs coarse screening to select similar data sets from the database based on the three hand proportions of the input image, and then performed a detailed search to find the data set most similar to the input images based on 64 image features. Using subjects with varying hand poses, we performed joint angle estimation using our hand pose estimation system comprised of 750,000 hand pose data sets, achieving roughly the same average estimation error as our previous system, about 2 degrees. However, the standard deviation of the estimation error was smaller than in our previous system having roughly 30,000 data sets: down from 26.91 degrees to 14.57 degrees for the index finger PIP joint and from 15.77 degrees to 10.28 degrees for the thumb. We were thus able to confirm an improvement in estimation accuracy, even for unspecified users. Further, the processing speed, using a notebook PC of normal specifications and a compact high-speed camera, was about 80 fps or more, including image capture, hand pose estimation, and CG rendering and robot control of the estimation result.","PeriodicalId":334652,"journal":{"name":"2010 IEEE/SICE International Symposium on System Integration","volume":"13 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2010-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2010 IEEE/SICE International Symposium on System Integration","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/SII.2010.5708323","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

The aim of this paper is to propose the technology to allow people to control robots by means of everyday gestures without using sensors or controllers. The hand pose estimation we propose reduces the number of image features per data set to 64, which makes the construction of a large-scale database possible. This has also made it possible to estimate the 3D hand poses of unspecified users with individual differences without sacrificing estimation accuracy. Specifically, the system we propose involved the construction in advance of a large database comprising three elements: hand joint information including the wrist, low-order proportional information on the hand images to indicate the rough hand shape, and hand pose data comprised of 64 image features per data set. To estimate a hand pose, the system first performs coarse screening to select similar data sets from the database based on the three hand proportions of the input image, and then performed a detailed search to find the data set most similar to the input images based on 64 image features. Using subjects with varying hand poses, we performed joint angle estimation using our hand pose estimation system comprised of 750,000 hand pose data sets, achieving roughly the same average estimation error as our previous system, about 2 degrees. However, the standard deviation of the estimation error was smaller than in our previous system having roughly 30,000 data sets: down from 26.91 degrees to 14.57 degrees for the index finger PIP joint and from 15.77 degrees to 10.28 degrees for the thumb. We were thus able to confirm an improvement in estimation accuracy, even for unspecified users. Further, the processing speed, using a notebook PC of normal specifications and a compact high-speed camera, was about 80 fps or more, including image capture, hand pose estimation, and CG rendering and robot control of the estimation result.
面向移动操作的手势世界环境技术
本文的目的是提出一种技术,允许人们通过日常手势来控制机器人,而不使用传感器或控制器。我们提出的手部姿态估计将每个数据集的图像特征数量减少到64个,这使得大规模数据库的构建成为可能。这也使得在不牺牲估计精度的情况下估计具有个体差异的未指定用户的3D手部姿势成为可能。具体而言,我们提出的系统涉及提前构建一个包含三个要素的大型数据库:手部关节信息(包括手腕),手部图像上的低阶比例信息(表示粗略的手部形状),以及每个数据集由64个图像特征组成的手部姿态数据。为了估计手的姿态,系统首先根据输入图像的三个手的比例从数据库中进行粗筛选,选择相似的数据集,然后根据64个图像特征进行详细搜索,找到与输入图像最相似的数据集。在不同手部姿态的实验对象中,我们使用由75万个手部姿态数据集组成的手部姿态估计系统进行关节角度估计,获得与之前系统大致相同的平均估计误差,约为2度。然而,估计误差的标准偏差比我们之前拥有大约30,000个数据集的系统要小:食指PIP关节从26.91度降至14.57度,拇指从15.77度降至10.28度。因此,我们能够确认在估计精度上的改进,甚至对于未指定的用户。此外,使用普通规格的笔记本电脑和紧凑型高速相机的处理速度约为80 fps以上,包括图像捕获,手部姿势估计,以及CG渲染和估计结果的机器人控制。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信