基于近端策略的群体机器人深度强化学习方法

2021 Zooming Innovation in Consumer Technologies Conference (ZINC) Pub Date : 2021-05-26 DOI:10.1109/ZINC52049.2021.9499288

Ziya Tan, Mehmet Karaköse

{"title":"基于近端策略的群体机器人深度强化学习方法","authors":"Ziya Tan, Mehmet Karaköse","doi":"10.1109/ZINC52049.2021.9499288","DOIUrl":null,"url":null,"abstract":"Artificial intelligence technology is becoming more active in all areas of our lives day by day. This technology affects our daily life by more developing in areas such as industry 4.0, security and education. Deep reinforcement learning is one of the most developed algorithms in the field of artificial intelligence. In this study, it is aimed that three different robots in a limited area learn to move without hitting each other, fixed obstacles and the boundaries of the field. These robots have been trained using the deep reinforcement learning approach and Proximal policy optimization (PPO) policy. Instead of uses value-based methods with the discrete action space, PPO that can easily manipulate the continuous action field and successfully determine the action of the robots has been proposed. PPO policy achieves successful results in multi-agent problems, especially with the use of the Actor-Critic network. In addition, information is given about environment control and learning approaches for swarm behavior. We propose parameter sharing and behavior-based method for this study. Finally, trained model is recorded and tested in 9 different environments where the obstacles are located differently. With our method, robots can perform their tasks in closed environments in the real world without damaging anyone or anything.","PeriodicalId":308106,"journal":{"name":"2021 Zooming Innovation in Consumer Technologies Conference (ZINC)","volume":"25 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-05-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"Proximal Policy Based Deep Reinforcement Learning Approach for Swarm Robots\",\"authors\":\"Ziya Tan, Mehmet Karaköse\",\"doi\":\"10.1109/ZINC52049.2021.9499288\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Artificial intelligence technology is becoming more active in all areas of our lives day by day. This technology affects our daily life by more developing in areas such as industry 4.0, security and education. Deep reinforcement learning is one of the most developed algorithms in the field of artificial intelligence. In this study, it is aimed that three different robots in a limited area learn to move without hitting each other, fixed obstacles and the boundaries of the field. These robots have been trained using the deep reinforcement learning approach and Proximal policy optimization (PPO) policy. Instead of uses value-based methods with the discrete action space, PPO that can easily manipulate the continuous action field and successfully determine the action of the robots has been proposed. PPO policy achieves successful results in multi-agent problems, especially with the use of the Actor-Critic network. In addition, information is given about environment control and learning approaches for swarm behavior. We propose parameter sharing and behavior-based method for this study. Finally, trained model is recorded and tested in 9 different environments where the obstacles are located differently. With our method, robots can perform their tasks in closed environments in the real world without damaging anyone or anything.\",\"PeriodicalId\":308106,\"journal\":{\"name\":\"2021 Zooming Innovation in Consumer Technologies Conference (ZINC)\",\"volume\":\"25 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2021-05-26\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2021 Zooming Innovation in Consumer Technologies Conference (ZINC)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ZINC52049.2021.9499288\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 Zooming Innovation in Consumer Technologies Conference (ZINC)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ZINC52049.2021.9499288","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 1

摘要

人工智能技术正日益活跃于我们生活的各个领域。这项技术通过在工业4.0、安全和教育等领域的更多发展，影响着我们的日常生活。深度强化学习是人工智能领域发展最快的算法之一。在这项研究中，三个不同的机器人在一个有限的区域内学习如何在不撞到对方、固定障碍物和场地边界的情况下移动。这些机器人已经使用深度强化学习方法和近端策略优化(PPO)策略进行了训练。提出了一种基于值的离散动作空间方法，该方法可以方便地操纵连续动作域并成功地确定机器人的动作。PPO策略在多智能体问题中取得了成功的结果，特别是使用了Actor-Critic网络。此外，还提供了环境控制和群体行为学习方法的相关信息。我们提出了参数共享和基于行为的研究方法。最后，对训练好的模型进行记录，并在障碍物位置不同的9种不同环境中进行测试。通过我们的方法，机器人可以在现实世界的封闭环境中执行任务，而不会损坏任何人或任何东西。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Proximal Policy Based Deep Reinforcement Learning Approach for Swarm Robots

Artificial intelligence technology is becoming more active in all areas of our lives day by day. This technology affects our daily life by more developing in areas such as industry 4.0, security and education. Deep reinforcement learning is one of the most developed algorithms in the field of artificial intelligence. In this study, it is aimed that three different robots in a limited area learn to move without hitting each other, fixed obstacles and the boundaries of the field. These robots have been trained using the deep reinforcement learning approach and Proximal policy optimization (PPO) policy. Instead of uses value-based methods with the discrete action space, PPO that can easily manipulate the continuous action field and successfully determine the action of the robots has been proposed. PPO policy achieves successful results in multi-agent problems, especially with the use of the Actor-Critic network. In addition, information is given about environment control and learning approaches for swarm behavior. We propose parameter sharing and behavior-based method for this study. Finally, trained model is recorded and tested in 9 different environments where the obstacles are located differently. With our method, robots can perform their tasks in closed environments in the real world without damaging anyone or anything.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2021 Zooming Innovation in Consumer Technologies Conference (ZINC)

自引率

0.00%

发文量