{"title":"机器人中高效强化学习的随机观察预测","authors":"Shisheng Wang, Hideki Nakayama","doi":"10.1109/MIPR51284.2021.00027","DOIUrl":null,"url":null,"abstract":"Although the recent progress of deep learning has enabled reinforcement learning (RL) algorithms to achieve human-level performance in retro video games within a short training time, the application of real-world robotics remains limited. The conventional RL procedure requires agents to interact with the environment. Meanwhile, the interactions with the physical world can not be easily parallelized or accelerated as in other tasks. Moreover, the gap between the real world and simulation makes it harder to transfer the policy trained in simulators to physical robots. Thus, we propose a model-based method to mitigate the interaction overheads for real-world robotic tasks. In particular, our model incorporates an autoencoder, a recurrent network, and a generative network to make stochastic predictions of observations. We conduct the experiments on a collision avoidance task for disc-like robots and show that the generative model can serve as a virtual RL environment. Our method has the benefit of lower interaction overheads as inference of deep neural networks on GPUs is faster than observing the transitions in the real environment, and it can replace the real RL environment with limited rollout length.","PeriodicalId":139543,"journal":{"name":"2021 IEEE 4th International Conference on Multimedia Information Processing and Retrieval (MIPR)","volume":"6 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Stochastic Observation Prediction for Efficient Reinforcement Learning in Robotics\",\"authors\":\"Shisheng Wang, Hideki Nakayama\",\"doi\":\"10.1109/MIPR51284.2021.00027\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Although the recent progress of deep learning has enabled reinforcement learning (RL) algorithms to achieve human-level performance in retro video games within a short training time, the application of real-world robotics remains limited. The conventional RL procedure requires agents to interact with the environment. Meanwhile, the interactions with the physical world can not be easily parallelized or accelerated as in other tasks. Moreover, the gap between the real world and simulation makes it harder to transfer the policy trained in simulators to physical robots. Thus, we propose a model-based method to mitigate the interaction overheads for real-world robotic tasks. In particular, our model incorporates an autoencoder, a recurrent network, and a generative network to make stochastic predictions of observations. We conduct the experiments on a collision avoidance task for disc-like robots and show that the generative model can serve as a virtual RL environment. Our method has the benefit of lower interaction overheads as inference of deep neural networks on GPUs is faster than observing the transitions in the real environment, and it can replace the real RL environment with limited rollout length.\",\"PeriodicalId\":139543,\"journal\":{\"name\":\"2021 IEEE 4th International Conference on Multimedia Information Processing and Retrieval (MIPR)\",\"volume\":\"6 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2021-09-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2021 IEEE 4th International Conference on Multimedia Information Processing and Retrieval (MIPR)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/MIPR51284.2021.00027\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 IEEE 4th International Conference on Multimedia Information Processing and Retrieval (MIPR)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/MIPR51284.2021.00027","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Stochastic Observation Prediction for Efficient Reinforcement Learning in Robotics
Although the recent progress of deep learning has enabled reinforcement learning (RL) algorithms to achieve human-level performance in retro video games within a short training time, the application of real-world robotics remains limited. The conventional RL procedure requires agents to interact with the environment. Meanwhile, the interactions with the physical world can not be easily parallelized or accelerated as in other tasks. Moreover, the gap between the real world and simulation makes it harder to transfer the policy trained in simulators to physical robots. Thus, we propose a model-based method to mitigate the interaction overheads for real-world robotic tasks. In particular, our model incorporates an autoencoder, a recurrent network, and a generative network to make stochastic predictions of observations. We conduct the experiments on a collision avoidance task for disc-like robots and show that the generative model can serve as a virtual RL environment. Our method has the benefit of lower interaction overheads as inference of deep neural networks on GPUs is faster than observing the transitions in the real environment, and it can replace the real RL environment with limited rollout length.