Mai Terashima, Ryo Okumura, Pedro Miguel Uriguen Eljuri, Katsuyoshi Maeyama, Yuanyuan Jia, Tadahiro Taniguchi
{"title":"6D NewtonianVAE:基于多视角视觉信息学习的机器人任务六自由度物体姿态估计与控制方法","authors":"Mai Terashima, Ryo Okumura, Pedro Miguel Uriguen Eljuri, Katsuyoshi Maeyama, Yuanyuan Jia, Tadahiro Taniguchi","doi":"10.1007/s10015-025-01026-0","DOIUrl":null,"url":null,"abstract":"<div><p>In this study, we propose a method for learning a latent space representing 6-DoF poses and performing 6-DoF control in the latent space using NewtonianVAE. NewtonianVAE, a type of world models based on Variational Autoencoder (VAE), can learn the dynamics of the environment as a latent space from observational data and perform proportional control based on the estimated position on the latent space. However, previous research has not demonstrated 6-DoF pose estimation and control using NewtonianVAE. Therefore, we propose 6D NewtonianVAE, which extends the latent space by incorporating the rotation vector to construct the latent space representing 6-DoF poses and perform 6-DoF control based on the estimated poses. Experimental results showed that our method achieves 6-DoF control with an accuracy within 7 mm and 0.02 rad in a real-world. It was also shown that 6-DoF control is possible even in unseen environments. Our approach enables end-to-end 6-DoF pose estimation and control without annotated data. It also eliminates the need for RGB-D or point cloud data and relies solely on RGB images, reducing implementation and computational costs.</p></div>","PeriodicalId":46050,"journal":{"name":"Artificial Life and Robotics","volume":"30 3","pages":"472 - 483"},"PeriodicalIF":0.8000,"publicationDate":"2025-05-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://link.springer.com/content/pdf/10.1007/s10015-025-01026-0.pdf","citationCount":"0","resultStr":"{\"title\":\"6D NewtonianVAE: 6-DoF object pose estimation and control method for robotic tasks via learning from multi-view visual information\",\"authors\":\"Mai Terashima, Ryo Okumura, Pedro Miguel Uriguen Eljuri, Katsuyoshi Maeyama, Yuanyuan Jia, Tadahiro Taniguchi\",\"doi\":\"10.1007/s10015-025-01026-0\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><p>In this study, we propose a method for learning a latent space representing 6-DoF poses and performing 6-DoF control in the latent space using NewtonianVAE. NewtonianVAE, a type of world models based on Variational Autoencoder (VAE), can learn the dynamics of the environment as a latent space from observational data and perform proportional control based on the estimated position on the latent space. However, previous research has not demonstrated 6-DoF pose estimation and control using NewtonianVAE. Therefore, we propose 6D NewtonianVAE, which extends the latent space by incorporating the rotation vector to construct the latent space representing 6-DoF poses and perform 6-DoF control based on the estimated poses. Experimental results showed that our method achieves 6-DoF control with an accuracy within 7 mm and 0.02 rad in a real-world. It was also shown that 6-DoF control is possible even in unseen environments. Our approach enables end-to-end 6-DoF pose estimation and control without annotated data. It also eliminates the need for RGB-D or point cloud data and relies solely on RGB images, reducing implementation and computational costs.</p></div>\",\"PeriodicalId\":46050,\"journal\":{\"name\":\"Artificial Life and Robotics\",\"volume\":\"30 3\",\"pages\":\"472 - 483\"},\"PeriodicalIF\":0.8000,\"publicationDate\":\"2025-05-21\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://link.springer.com/content/pdf/10.1007/s10015-025-01026-0.pdf\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Artificial Life and Robotics\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://link.springer.com/article/10.1007/s10015-025-01026-0\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q4\",\"JCRName\":\"ROBOTICS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Artificial Life and Robotics","FirstCategoryId":"1085","ListUrlMain":"https://link.springer.com/article/10.1007/s10015-025-01026-0","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q4","JCRName":"ROBOTICS","Score":null,"Total":0}
6D NewtonianVAE: 6-DoF object pose estimation and control method for robotic tasks via learning from multi-view visual information
In this study, we propose a method for learning a latent space representing 6-DoF poses and performing 6-DoF control in the latent space using NewtonianVAE. NewtonianVAE, a type of world models based on Variational Autoencoder (VAE), can learn the dynamics of the environment as a latent space from observational data and perform proportional control based on the estimated position on the latent space. However, previous research has not demonstrated 6-DoF pose estimation and control using NewtonianVAE. Therefore, we propose 6D NewtonianVAE, which extends the latent space by incorporating the rotation vector to construct the latent space representing 6-DoF poses and perform 6-DoF control based on the estimated poses. Experimental results showed that our method achieves 6-DoF control with an accuracy within 7 mm and 0.02 rad in a real-world. It was also shown that 6-DoF control is possible even in unseen environments. Our approach enables end-to-end 6-DoF pose estimation and control without annotated data. It also eliminates the need for RGB-D or point cloud data and relies solely on RGB images, reducing implementation and computational costs.