基于分布式深度强化学习的自主推土机路径优化

2021 IEEE International Conference on Mechatronics (ICM) Pub Date : 2021-03-07 DOI:10.1109/ICM46511.2021.9385686

Yasuhiro Osaka, Naoya Odajima, Y. Uchimura

{"title":"基于分布式深度强化学习的自主推土机路径优化","authors":"Yasuhiro Osaka, Naoya Odajima, Y. Uchimura","doi":"10.1109/ICM46511.2021.9385686","DOIUrl":null,"url":null,"abstract":"Since the publication showed DQN based reinforcement learning methods exceeds human's score in Atari 2600 video games, various deep reinforcement learning have bee researched. This paper proposes a method to control bulldozer autonomously by learning the sediment leveling route using PPO that enables distributed deep reinforcement learning. The simulator was originally developed that enables to reproduce the behavior of small and uniform sediment. By incorporating an LSTM that processes the input state as time-series data into the agent network, more than 95% of the sediment in the target area on average was achieved. In addition, the generalization performance for unknown condition was evaluated, by giving unlearned conditions were given as initial setups.","PeriodicalId":373423,"journal":{"name":"2021 IEEE International Conference on Mechatronics (ICM)","volume":"72 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-03-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":"{\"title\":\"Route optimization for autonomous bulldozer by distributed deep reinforcement learning\",\"authors\":\"Yasuhiro Osaka, Naoya Odajima, Y. Uchimura\",\"doi\":\"10.1109/ICM46511.2021.9385686\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Since the publication showed DQN based reinforcement learning methods exceeds human's score in Atari 2600 video games, various deep reinforcement learning have bee researched. This paper proposes a method to control bulldozer autonomously by learning the sediment leveling route using PPO that enables distributed deep reinforcement learning. The simulator was originally developed that enables to reproduce the behavior of small and uniform sediment. By incorporating an LSTM that processes the input state as time-series data into the agent network, more than 95% of the sediment in the target area on average was achieved. In addition, the generalization performance for unknown condition was evaluated, by giving unlearned conditions were given as initial setups.\",\"PeriodicalId\":373423,\"journal\":{\"name\":\"2021 IEEE International Conference on Mechatronics (ICM)\",\"volume\":\"72 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2021-03-07\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"3\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2021 IEEE International Conference on Mechatronics (ICM)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICM46511.2021.9385686\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 IEEE International Conference on Mechatronics (ICM)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICM46511.2021.9385686","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 3

摘要

由于该出版物显示基于DQN的强化学习方法超过了人类在Atari 2600视频游戏中的得分，各种深度强化学习已经被研究。本文提出了一种利用PPO学习泥沙平整路径来实现推土机自主控制的方法，该方法实现了分布式深度强化学习。模拟器最初是为了重现小而均匀的沉积物的行为而开发的。通过将LSTM将输入状态作为时间序列数据处理到代理网络中，平均可获得目标区域95%以上的沉积物。此外，通过给出未学习条件作为初始设置，评估了未知条件下的泛化性能。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Route optimization for autonomous bulldozer by distributed deep reinforcement learning

Since the publication showed DQN based reinforcement learning methods exceeds human's score in Atari 2600 video games, various deep reinforcement learning have bee researched. This paper proposes a method to control bulldozer autonomously by learning the sediment leveling route using PPO that enables distributed deep reinforcement learning. The simulator was originally developed that enables to reproduce the behavior of small and uniform sediment. By incorporating an LSTM that processes the input state as time-series data into the agent network, more than 95% of the sediment in the target area on average was achieved. In addition, the generalization performance for unknown condition was evaluated, by giving unlearned conditions were given as initial setups.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2021 IEEE International Conference on Mechatronics (ICM)

自引率

0.00%

发文量