Kaede Hayashi, Wenru Zheng, Lotfi El Hafi, Y. Hagiwara, T. Taniguchi
{"title":"服务机器人应用中使用深度生成模型的对象图像和位置的双向生成","authors":"Kaede Hayashi, Wenru Zheng, Lotfi El Hafi, Y. Hagiwara, T. Taniguchi","doi":"10.1109/IEEECONF49454.2021.9382768","DOIUrl":null,"url":null,"abstract":"The introduction of systems and robots for automated services is important for reducing running costs and improving operational efficiency in the retail industry. To this aim, we develop a system that enables robot agents to display products in stores. The main problem in automating product display using common supervised methods with robot agents is the huge amount of data required to recognize product categories and arrangements in a variety of different store layouts. To solve this problem, we propose a crossmodal inference system based on joint multimodal variational autoencoder (JMVAE) that learns the relationship between object image information and location information observed on site by robot agents. In our experiments, we created a simulation environment replicating a convenience store that allows a robot agent to observe an object image and its 3D coordinate information, and confirmed whether JMVAE can learn and generate a shared representation of an object image and 3D coordinates in a bidirectional manner.","PeriodicalId":395378,"journal":{"name":"2021 IEEE/SICE International Symposium on System Integration (SII)","volume":"34 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-01-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":"{\"title\":\"Bidirectional Generation of Object Images and Positions using Deep Generative Models for Service Robotics Applications\",\"authors\":\"Kaede Hayashi, Wenru Zheng, Lotfi El Hafi, Y. Hagiwara, T. Taniguchi\",\"doi\":\"10.1109/IEEECONF49454.2021.9382768\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The introduction of systems and robots for automated services is important for reducing running costs and improving operational efficiency in the retail industry. To this aim, we develop a system that enables robot agents to display products in stores. The main problem in automating product display using common supervised methods with robot agents is the huge amount of data required to recognize product categories and arrangements in a variety of different store layouts. To solve this problem, we propose a crossmodal inference system based on joint multimodal variational autoencoder (JMVAE) that learns the relationship between object image information and location information observed on site by robot agents. In our experiments, we created a simulation environment replicating a convenience store that allows a robot agent to observe an object image and its 3D coordinate information, and confirmed whether JMVAE can learn and generate a shared representation of an object image and 3D coordinates in a bidirectional manner.\",\"PeriodicalId\":395378,\"journal\":{\"name\":\"2021 IEEE/SICE International Symposium on System Integration (SII)\",\"volume\":\"34 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2021-01-11\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"3\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2021 IEEE/SICE International Symposium on System Integration (SII)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/IEEECONF49454.2021.9382768\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 IEEE/SICE International Symposium on System Integration (SII)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/IEEECONF49454.2021.9382768","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Bidirectional Generation of Object Images and Positions using Deep Generative Models for Service Robotics Applications
The introduction of systems and robots for automated services is important for reducing running costs and improving operational efficiency in the retail industry. To this aim, we develop a system that enables robot agents to display products in stores. The main problem in automating product display using common supervised methods with robot agents is the huge amount of data required to recognize product categories and arrangements in a variety of different store layouts. To solve this problem, we propose a crossmodal inference system based on joint multimodal variational autoencoder (JMVAE) that learns the relationship between object image information and location information observed on site by robot agents. In our experiments, we created a simulation environment replicating a convenience store that allows a robot agent to observe an object image and its 3D coordinate information, and confirmed whether JMVAE can learn and generate a shared representation of an object image and 3D coordinates in a bidirectional manner.