6D NewtonianVAE: 6-DoF object pose estimation and control method for robotic tasks via learning from multi-view visual information

IF 0.8 Q4 ROBOTICS
Mai Terashima, Ryo Okumura, Pedro Miguel Uriguen Eljuri, Katsuyoshi Maeyama, Yuanyuan Jia, Tadahiro Taniguchi
{"title":"6D NewtonianVAE: 6-DoF object pose estimation and control method for robotic tasks via learning from multi-view visual information","authors":"Mai Terashima,&nbsp;Ryo Okumura,&nbsp;Pedro Miguel Uriguen Eljuri,&nbsp;Katsuyoshi Maeyama,&nbsp;Yuanyuan Jia,&nbsp;Tadahiro Taniguchi","doi":"10.1007/s10015-025-01026-0","DOIUrl":null,"url":null,"abstract":"<div><p>In this study, we propose a method for learning a latent space representing 6-DoF poses and performing 6-DoF control in the latent space using NewtonianVAE. NewtonianVAE, a type of world models based on Variational Autoencoder (VAE), can learn the dynamics of the environment as a latent space from observational data and perform proportional control based on the estimated position on the latent space. However, previous research has not demonstrated 6-DoF pose estimation and control using NewtonianVAE. Therefore, we propose 6D NewtonianVAE, which extends the latent space by incorporating the rotation vector to construct the latent space representing 6-DoF poses and perform 6-DoF control based on the estimated poses. Experimental results showed that our method achieves 6-DoF control with an accuracy within 7 mm and 0.02 rad in a real-world. It was also shown that 6-DoF control is possible even in unseen environments. Our approach enables end-to-end 6-DoF pose estimation and control without annotated data. It also eliminates the need for RGB-D or point cloud data and relies solely on RGB images, reducing implementation and computational costs.</p></div>","PeriodicalId":46050,"journal":{"name":"Artificial Life and Robotics","volume":"30 3","pages":"472 - 483"},"PeriodicalIF":0.8000,"publicationDate":"2025-05-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://link.springer.com/content/pdf/10.1007/s10015-025-01026-0.pdf","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Artificial Life and Robotics","FirstCategoryId":"1085","ListUrlMain":"https://link.springer.com/article/10.1007/s10015-025-01026-0","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q4","JCRName":"ROBOTICS","Score":null,"Total":0}
引用次数: 0

Abstract

In this study, we propose a method for learning a latent space representing 6-DoF poses and performing 6-DoF control in the latent space using NewtonianVAE. NewtonianVAE, a type of world models based on Variational Autoencoder (VAE), can learn the dynamics of the environment as a latent space from observational data and perform proportional control based on the estimated position on the latent space. However, previous research has not demonstrated 6-DoF pose estimation and control using NewtonianVAE. Therefore, we propose 6D NewtonianVAE, which extends the latent space by incorporating the rotation vector to construct the latent space representing 6-DoF poses and perform 6-DoF control based on the estimated poses. Experimental results showed that our method achieves 6-DoF control with an accuracy within 7 mm and 0.02 rad in a real-world. It was also shown that 6-DoF control is possible even in unseen environments. Our approach enables end-to-end 6-DoF pose estimation and control without annotated data. It also eliminates the need for RGB-D or point cloud data and relies solely on RGB images, reducing implementation and computational costs.

6D NewtonianVAE:基于多视角视觉信息学习的机器人任务六自由度物体姿态估计与控制方法
在本研究中,我们提出了一种学习代表6自由度姿态的潜在空间并使用牛顿vae在潜在空间中进行6自由度控制的方法。牛顿VAE是一种基于变分自编码器(VAE)的世界模型,它可以从观测数据中学习环境的动态作为潜在空间,并根据潜在空间上的估计位置进行比例控制。然而,以前的研究尚未证明使用牛顿vae进行6自由度姿态估计和控制。因此,我们提出了6D牛顿vae,通过结合旋转向量来扩展潜空间,构建代表6-DoF位姿的潜空间,并基于估计的位姿进行6-DoF控制。实验结果表明,该方法在实际应用中实现了精度在7 mm和0.02 rad以内的6自由度控制。研究还表明,即使在看不见的环境中,6自由度控制也是可能的。我们的方法可以在没有注释数据的情况下实现端到端的6自由度姿态估计和控制。它还消除了对RGB- d或点云数据的需求,仅依赖于RGB图像,从而降低了实现和计算成本。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
CiteScore
2.00
自引率
22.20%
发文量
101
期刊介绍: Artificial Life and Robotics is an international journal publishing original technical papers and authoritative state-of-the-art reviews on the development of new technologies concerning artificial life and robotics, especially computer-based simulation and hardware for the twenty-first century. This journal covers a broad multidisciplinary field, including areas such as artificial brain research, artificial intelligence, artificial life, artificial living, artificial mind research, brain science, chaos, cognitive science, complexity, computer graphics, evolutionary computations, fuzzy control, genetic algorithms, innovative computations, intelligent control and modelling, micromachines, micro-robot world cup soccer tournament, mobile vehicles, neural networks, neurocomputers, neurocomputing technologies and applications, robotics, robus virtual engineering, and virtual reality. Hardware-oriented submissions are particularly welcome. Publishing body: International Symposium on Artificial Life and RoboticsEditor-in-Chiei: Hiroshi Tanaka Hatanaka R Apartment 101, Hatanaka 8-7A, Ooaza-Hatanaka, Oita city, Oita, Japan 870-0856 ©International Symposium on Artificial Life and Robotics
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信