6D NewtonianVAE: 6-DoF object pose estimation and control method for robotic tasks via learning from multi-view visual information

IF 0.8 Q4 ROBOTICS

Artificial Life and Robotics Pub Date : 2025-05-21 DOI:10.1007/s10015-025-01026-0

Mai Terashima, Ryo Okumura, Pedro Miguel Uriguen Eljuri, Katsuyoshi Maeyama, Yuanyuan Jia, Tadahiro Taniguchi

{"title":"6D NewtonianVAE: 6-DoF object pose estimation and control method for robotic tasks via learning from multi-view visual information","authors":"Mai Terashima, Ryo Okumura, Pedro Miguel Uriguen Eljuri, Katsuyoshi Maeyama, Yuanyuan Jia, Tadahiro Taniguchi","doi":"10.1007/s10015-025-01026-0","DOIUrl":null,"url":null,"abstract":"<div><p>In this study, we propose a method for learning a latent space representing 6-DoF poses and performing 6-DoF control in the latent space using NewtonianVAE. NewtonianVAE, a type of world models based on Variational Autoencoder (VAE), can learn the dynamics of the environment as a latent space from observational data and perform proportional control based on the estimated position on the latent space. However, previous research has not demonstrated 6-DoF pose estimation and control using NewtonianVAE. Therefore, we propose 6D NewtonianVAE, which extends the latent space by incorporating the rotation vector to construct the latent space representing 6-DoF poses and perform 6-DoF control based on the estimated poses. Experimental results showed that our method achieves 6-DoF control with an accuracy within 7 mm and 0.02 rad in a real-world. It was also shown that 6-DoF control is possible even in unseen environments. Our approach enables end-to-end 6-DoF pose estimation and control without annotated data. It also eliminates the need for RGB-D or point cloud data and relies solely on RGB images, reducing implementation and computational costs.</p></div>","PeriodicalId":46050,"journal":{"name":"Artificial Life and Robotics","volume":"30 3","pages":"472 - 483"},"PeriodicalIF":0.8000,"publicationDate":"2025-05-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://link.springer.com/content/pdf/10.1007/s10015-025-01026-0.pdf","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Artificial Life and Robotics","FirstCategoryId":"1085","ListUrlMain":"https://link.springer.com/article/10.1007/s10015-025-01026-0","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q4","JCRName":"ROBOTICS","Score":null,"Total":0}

引用次数: 0

Abstract

In this study, we propose a method for learning a latent space representing 6-DoF poses and performing 6-DoF control in the latent space using NewtonianVAE. NewtonianVAE, a type of world models based on Variational Autoencoder (VAE), can learn the dynamics of the environment as a latent space from observational data and perform proportional control based on the estimated position on the latent space. However, previous research has not demonstrated 6-DoF pose estimation and control using NewtonianVAE. Therefore, we propose 6D NewtonianVAE, which extends the latent space by incorporating the rotation vector to construct the latent space representing 6-DoF poses and perform 6-DoF control based on the estimated poses. Experimental results showed that our method achieves 6-DoF control with an accuracy within 7 mm and 0.02 rad in a real-world. It was also shown that 6-DoF control is possible even in unseen environments. Our approach enables end-to-end 6-DoF pose estimation and control without annotated data. It also eliminates the need for RGB-D or point cloud data and relies solely on RGB images, reducing implementation and computational costs.

查看原文本刊更多论文

6D NewtonianVAE：基于多视角视觉信息学习的机器人任务六自由度物体姿态估计与控制方法

在本研究中，我们提出了一种学习代表6自由度姿态的潜在空间并使用牛顿vae在潜在空间中进行6自由度控制的方法。牛顿VAE是一种基于变分自编码器（VAE）的世界模型，它可以从观测数据中学习环境的动态作为潜在空间，并根据潜在空间上的估计位置进行比例控制。然而，以前的研究尚未证明使用牛顿vae进行6自由度姿态估计和控制。因此，我们提出了6D牛顿vae，通过结合旋转向量来扩展潜空间，构建代表6-DoF位姿的潜空间，并基于估计的位姿进行6-DoF控制。实验结果表明，该方法在实际应用中实现了精度在7 mm和0.02 rad以内的6自由度控制。研究还表明，即使在看不见的环境中，6自由度控制也是可能的。我们的方法可以在没有注释数据的情况下实现端到端的6自由度姿态估计和控制。它还消除了对RGB- d或点云数据的需求，仅依赖于RGB图像，从而降低了实现和计算成本。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Artificial Life and Robotics ROBOTICS-

CiteScore

2.00

自引率

22.20%

发文量

101

期刊介绍： Artificial Life and Robotics is an international journal publishing original technical papers and authoritative state-of-the-art reviews on the development of new technologies concerning artificial life and robotics, especially computer-based simulation and hardware for the twenty-first century. This journal covers a broad multidisciplinary field, including areas such as artificial brain research, artificial intelligence, artificial life, artificial living, artificial mind research, brain science, chaos, cognitive science, complexity, computer graphics, evolutionary computations, fuzzy control, genetic algorithms, innovative computations, intelligent control and modelling, micromachines, micro-robot world cup soccer tournament, mobile vehicles, neural networks, neurocomputers, neurocomputing technologies and applications, robotics, robus virtual engineering, and virtual reality. Hardware-oriented submissions are particularly welcome. Publishing body: International Symposium on Artificial Life and RoboticsEditor-in-Chiei: Hiroshi Tanaka Hatanaka R Apartment 101, Hatanaka 8-7A, Ooaza-Hatanaka, Oita city, Oita, Japan 870-0856 ©International Symposium on Artificial Life and Robotics