基于强化学习和异步关键Agent (A3C)算法的双足机器人设计

2023 2nd International Conference on Vision Towards Emerging Trends in Communication and Networking Technologies (ViTECoN) Pub Date : 2023-05-05 DOI:10.1109/ViTECoN58111.2023.10156947

M. Navaneethakrishnan, P. Pushpa, T. T, T. A. Mohanaprakash, Batini Dhanwanth, Faraz Ahmed A S

{"title":"基于强化学习和异步关键Agent (A3C)算法的双足机器人设计","authors":"M. Navaneethakrishnan, P. Pushpa, T. T, T. A. Mohanaprakash, Batini Dhanwanth, Faraz Ahmed A S","doi":"10.1109/ViTECoN58111.2023.10156947","DOIUrl":null,"url":null,"abstract":"The creation of a humanoid robot necessitates a remarkable interdisciplinary effort spanning engineering, mathematics, software, and machine learning. In this work, we investigate the policy-based algorithm known as Reinforce, which is a deep reinforcement method. The goal of policy-based approaches is to directly optimize the policy without the utilizes of a value function. Reinforce specifically belongs to the Policy-Gradient techniques subclass of Policy-Based techniques. This subclass uses gradient ascent to estimate the weights of the ideal policy, directly optimizing the policy. In order to stabilize the training by lowering the variance, a hybrid architecture combining policy-based and value-based methodologies is proposed in this paper. Asynchronous Advantage Actor-Critic (A3C), a hybrid technique, trains agents in robotic environments by employing Stable-Baselines3. It trains two agents to walk, one on two legs and the other on a spider moment. According to the experimental findings, both robots are able to recognize the target's orientation, move to the proper location, and then successfully raise the target together.","PeriodicalId":407488,"journal":{"name":"2023 2nd International Conference on Vision Towards Emerging Trends in Communication and Networking Technologies (ViTECoN)","volume":"93 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-05-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Design of Biped Robot Using Reinforcement Learning and Asynchronous Actor-Critical Agent (A3C) Algorithm\",\"authors\":\"M. Navaneethakrishnan, P. Pushpa, T. T, T. A. Mohanaprakash, Batini Dhanwanth, Faraz Ahmed A S\",\"doi\":\"10.1109/ViTECoN58111.2023.10156947\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The creation of a humanoid robot necessitates a remarkable interdisciplinary effort spanning engineering, mathematics, software, and machine learning. In this work, we investigate the policy-based algorithm known as Reinforce, which is a deep reinforcement method. The goal of policy-based approaches is to directly optimize the policy without the utilizes of a value function. Reinforce specifically belongs to the Policy-Gradient techniques subclass of Policy-Based techniques. This subclass uses gradient ascent to estimate the weights of the ideal policy, directly optimizing the policy. In order to stabilize the training by lowering the variance, a hybrid architecture combining policy-based and value-based methodologies is proposed in this paper. Asynchronous Advantage Actor-Critic (A3C), a hybrid technique, trains agents in robotic environments by employing Stable-Baselines3. It trains two agents to walk, one on two legs and the other on a spider moment. According to the experimental findings, both robots are able to recognize the target's orientation, move to the proper location, and then successfully raise the target together.\",\"PeriodicalId\":407488,\"journal\":{\"name\":\"2023 2nd International Conference on Vision Towards Emerging Trends in Communication and Networking Technologies (ViTECoN)\",\"volume\":\"93 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2023-05-05\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2023 2nd International Conference on Vision Towards Emerging Trends in Communication and Networking Technologies (ViTECoN)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ViTECoN58111.2023.10156947\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2023 2nd International Conference on Vision Towards Emerging Trends in Communication and Networking Technologies (ViTECoN)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ViTECoN58111.2023.10156947","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

摘要

人形机器人的创造需要跨越工程、数学、软件和机器学习的跨学科努力。在这项工作中，我们研究了基于策略的强化算法，这是一种深度强化方法。基于策略的方法的目标是直接优化策略，而不使用值函数。强化属于基于策略的技术的策略梯度技术子类。这个子类使用梯度上升来估计理想策略的权重，直接优化策略。为了通过降低方差来稳定训练，本文提出了一种基于策略和基于值的混合体系结构。异步优势Actor-Critic (A3C)是一种混合技术，通过使用稳定基线(stablebaselines)在机器人环境中训练代理。它训练两个智能体走路，一个用两条腿走路，另一个用蜘蛛动作走路。根据实验结果，两个机器人都能够识别目标的方向，移动到合适的位置，然后成功地一起抬起目标。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Design of Biped Robot Using Reinforcement Learning and Asynchronous Actor-Critical Agent (A3C) Algorithm

The creation of a humanoid robot necessitates a remarkable interdisciplinary effort spanning engineering, mathematics, software, and machine learning. In this work, we investigate the policy-based algorithm known as Reinforce, which is a deep reinforcement method. The goal of policy-based approaches is to directly optimize the policy without the utilizes of a value function. Reinforce specifically belongs to the Policy-Gradient techniques subclass of Policy-Based techniques. This subclass uses gradient ascent to estimate the weights of the ideal policy, directly optimizing the policy. In order to stabilize the training by lowering the variance, a hybrid architecture combining policy-based and value-based methodologies is proposed in this paper. Asynchronous Advantage Actor-Critic (A3C), a hybrid technique, trains agents in robotic environments by employing Stable-Baselines3. It trains two agents to walk, one on two legs and the other on a spider moment. According to the experimental findings, both robots are able to recognize the target's orientation, move to the proper location, and then successfully raise the target together.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2023 2nd International Conference on Vision Towards Emerging Trends in Communication and Networking Technologies (ViTECoN)

自引率

0.00%

发文量