Design of Biped Robot Using Reinforcement Learning and Asynchronous Actor-Critical Agent (A3C) Algorithm

M. Navaneethakrishnan, P. Pushpa, T. T, T. A. Mohanaprakash, Batini Dhanwanth, Faraz Ahmed A S
{"title":"Design of Biped Robot Using Reinforcement Learning and Asynchronous Actor-Critical Agent (A3C) Algorithm","authors":"M. Navaneethakrishnan, P. Pushpa, T. T, T. A. Mohanaprakash, Batini Dhanwanth, Faraz Ahmed A S","doi":"10.1109/ViTECoN58111.2023.10156947","DOIUrl":null,"url":null,"abstract":"The creation of a humanoid robot necessitates a remarkable interdisciplinary effort spanning engineering, mathematics, software, and machine learning. In this work, we investigate the policy-based algorithm known as Reinforce, which is a deep reinforcement method. The goal of policy-based approaches is to directly optimize the policy without the utilizes of a value function. Reinforce specifically belongs to the Policy-Gradient techniques subclass of Policy-Based techniques. This subclass uses gradient ascent to estimate the weights of the ideal policy, directly optimizing the policy. In order to stabilize the training by lowering the variance, a hybrid architecture combining policy-based and value-based methodologies is proposed in this paper. Asynchronous Advantage Actor-Critic (A3C), a hybrid technique, trains agents in robotic environments by employing Stable-Baselines3. It trains two agents to walk, one on two legs and the other on a spider moment. According to the experimental findings, both robots are able to recognize the target's orientation, move to the proper location, and then successfully raise the target together.","PeriodicalId":407488,"journal":{"name":"2023 2nd International Conference on Vision Towards Emerging Trends in Communication and Networking Technologies (ViTECoN)","volume":"93 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-05-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2023 2nd International Conference on Vision Towards Emerging Trends in Communication and Networking Technologies (ViTECoN)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ViTECoN58111.2023.10156947","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

The creation of a humanoid robot necessitates a remarkable interdisciplinary effort spanning engineering, mathematics, software, and machine learning. In this work, we investigate the policy-based algorithm known as Reinforce, which is a deep reinforcement method. The goal of policy-based approaches is to directly optimize the policy without the utilizes of a value function. Reinforce specifically belongs to the Policy-Gradient techniques subclass of Policy-Based techniques. This subclass uses gradient ascent to estimate the weights of the ideal policy, directly optimizing the policy. In order to stabilize the training by lowering the variance, a hybrid architecture combining policy-based and value-based methodologies is proposed in this paper. Asynchronous Advantage Actor-Critic (A3C), a hybrid technique, trains agents in robotic environments by employing Stable-Baselines3. It trains two agents to walk, one on two legs and the other on a spider moment. According to the experimental findings, both robots are able to recognize the target's orientation, move to the proper location, and then successfully raise the target together.
基于强化学习和异步关键Agent (A3C)算法的双足机器人设计
人形机器人的创造需要跨越工程、数学、软件和机器学习的跨学科努力。在这项工作中,我们研究了基于策略的强化算法,这是一种深度强化方法。基于策略的方法的目标是直接优化策略,而不使用值函数。强化属于基于策略的技术的策略梯度技术子类。这个子类使用梯度上升来估计理想策略的权重,直接优化策略。为了通过降低方差来稳定训练,本文提出了一种基于策略和基于值的混合体系结构。异步优势Actor-Critic (A3C)是一种混合技术,通过使用稳定基线(stablebaselines)在机器人环境中训练代理。它训练两个智能体走路,一个用两条腿走路,另一个用蜘蛛动作走路。根据实验结果,两个机器人都能够识别目标的方向,移动到合适的位置,然后成功地一起抬起目标。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信