{"title":"基于分布式深度强化学习的自主推土机路径优化","authors":"Yasuhiro Osaka, Naoya Odajima, Y. Uchimura","doi":"10.1109/ICM46511.2021.9385686","DOIUrl":null,"url":null,"abstract":"Since the publication showed DQN based reinforcement learning methods exceeds human's score in Atari 2600 video games, various deep reinforcement learning have bee researched. This paper proposes a method to control bulldozer autonomously by learning the sediment leveling route using PPO that enables distributed deep reinforcement learning. The simulator was originally developed that enables to reproduce the behavior of small and uniform sediment. By incorporating an LSTM that processes the input state as time-series data into the agent network, more than 95% of the sediment in the target area on average was achieved. In addition, the generalization performance for unknown condition was evaluated, by giving unlearned conditions were given as initial setups.","PeriodicalId":373423,"journal":{"name":"2021 IEEE International Conference on Mechatronics (ICM)","volume":"72 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-03-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":"{\"title\":\"Route optimization for autonomous bulldozer by distributed deep reinforcement learning\",\"authors\":\"Yasuhiro Osaka, Naoya Odajima, Y. Uchimura\",\"doi\":\"10.1109/ICM46511.2021.9385686\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Since the publication showed DQN based reinforcement learning methods exceeds human's score in Atari 2600 video games, various deep reinforcement learning have bee researched. This paper proposes a method to control bulldozer autonomously by learning the sediment leveling route using PPO that enables distributed deep reinforcement learning. The simulator was originally developed that enables to reproduce the behavior of small and uniform sediment. By incorporating an LSTM that processes the input state as time-series data into the agent network, more than 95% of the sediment in the target area on average was achieved. In addition, the generalization performance for unknown condition was evaluated, by giving unlearned conditions were given as initial setups.\",\"PeriodicalId\":373423,\"journal\":{\"name\":\"2021 IEEE International Conference on Mechatronics (ICM)\",\"volume\":\"72 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2021-03-07\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"3\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2021 IEEE International Conference on Mechatronics (ICM)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICM46511.2021.9385686\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 IEEE International Conference on Mechatronics (ICM)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICM46511.2021.9385686","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Route optimization for autonomous bulldozer by distributed deep reinforcement learning
Since the publication showed DQN based reinforcement learning methods exceeds human's score in Atari 2600 video games, various deep reinforcement learning have bee researched. This paper proposes a method to control bulldozer autonomously by learning the sediment leveling route using PPO that enables distributed deep reinforcement learning. The simulator was originally developed that enables to reproduce the behavior of small and uniform sediment. By incorporating an LSTM that processes the input state as time-series data into the agent network, more than 95% of the sediment in the target area on average was achieved. In addition, the generalization performance for unknown condition was evaluated, by giving unlearned conditions were given as initial setups.