双延迟DDPG:一种模拟智能机器人连续运动的深度强化学习技术

Proceedings of the 3rd International Conference on Vision, Image and Signal Processing Pub Date : 2019-08-26 DOI:10.1145/3387168.3387199

Stephen Dankwa, Wenfeng Zheng

{"title":"双延迟DDPG:一种模拟智能机器人连续运动的深度强化学习技术","authors":"Stephen Dankwa, Wenfeng Zheng","doi":"10.1145/3387168.3387199","DOIUrl":null,"url":null,"abstract":"In this current research, Twin-Delayed DDPG (TD3) algorithm has been used to solve the most challenging virtual Artificial Intelligence application by training a 4-ant-legged robot as an Intelligent Agent to run across a field. Twin-Delayed DDPG (TD3) is an incredibly smart AI model of a Deep Reinforcement Learning which combines the state-of-the-art methods in Artificial Intelligence. These includes Policy gradient, Actor-Critics, and continuous Double Deep Q-Learning. These Deep Reinforcement Learning approaches trained an Intelligent agent to interact with an environment with automatic feature engineering, that is, necessitating minimal domain knowledge. For the implementation of the TD3, we used a two-layer feedforward neural network of 400 and 300 hidden nodes respectively, with Rectified Linear Units (ReLU) as an activation function between each layer for both the Actor and Critics. We, then added a final tanh unit after the output of the Actor. The Critic receives both the state and action as input to the first layer. Both the network parameters were updated using Adam optimizer. The idea behind the Twin-Delayed DDPG (TD3) is to reduce overestimation bias in Deep Q-Learning with discrete actions which are ineffective in an Actor-Critic domain setting. Based on the Maximum Average Reward over the evaluation time-step, our model achieved an approximate maximum of 2364. Therefore, we can truly say that, TD3 has obviously improved on both the learning speed and performance of the Deep Deterministic Policy Gradient (DDPG) in a challenging environment in a continuous control domain.","PeriodicalId":346739,"journal":{"name":"Proceedings of the 3rd International Conference on Vision, Image and Signal Processing","volume":"65 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-08-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"74","resultStr":"{\"title\":\"Twin-Delayed DDPG: A Deep Reinforcement Learning Technique to Model a Continuous Movement of an Intelligent Robot Agent\",\"authors\":\"Stephen Dankwa, Wenfeng Zheng\",\"doi\":\"10.1145/3387168.3387199\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"In this current research, Twin-Delayed DDPG (TD3) algorithm has been used to solve the most challenging virtual Artificial Intelligence application by training a 4-ant-legged robot as an Intelligent Agent to run across a field. Twin-Delayed DDPG (TD3) is an incredibly smart AI model of a Deep Reinforcement Learning which combines the state-of-the-art methods in Artificial Intelligence. These includes Policy gradient, Actor-Critics, and continuous Double Deep Q-Learning. These Deep Reinforcement Learning approaches trained an Intelligent agent to interact with an environment with automatic feature engineering, that is, necessitating minimal domain knowledge. For the implementation of the TD3, we used a two-layer feedforward neural network of 400 and 300 hidden nodes respectively, with Rectified Linear Units (ReLU) as an activation function between each layer for both the Actor and Critics. We, then added a final tanh unit after the output of the Actor. The Critic receives both the state and action as input to the first layer. Both the network parameters were updated using Adam optimizer. The idea behind the Twin-Delayed DDPG (TD3) is to reduce overestimation bias in Deep Q-Learning with discrete actions which are ineffective in an Actor-Critic domain setting. Based on the Maximum Average Reward over the evaluation time-step, our model achieved an approximate maximum of 2364. Therefore, we can truly say that, TD3 has obviously improved on both the learning speed and performance of the Deep Deterministic Policy Gradient (DDPG) in a challenging environment in a continuous control domain.\",\"PeriodicalId\":346739,\"journal\":{\"name\":\"Proceedings of the 3rd International Conference on Vision, Image and Signal Processing\",\"volume\":\"65 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2019-08-26\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"74\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the 3rd International Conference on Vision, Image and Signal Processing\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/3387168.3387199\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 3rd International Conference on Vision, Image and Signal Processing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3387168.3387199","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 74

摘要

在当前的研究中，双延迟DDPG (TD3)算法被用于解决最具挑战性的虚拟人工智能应用，通过训练一个四蚁腿机器人作为智能代理在一个领域中奔跑。双延迟DDPG (TD3)是一种非常智能的深度强化学习AI模型，它结合了人工智能领域最先进的方法。这些包括策略梯度、行动者-批评家和连续双深度q -学习。这些深度强化学习方法通过自动特征工程训练智能代理与环境交互，也就是说，需要最少的领域知识。为了实现TD3，我们使用了一个两层前馈神经网络，分别包含400个和300个隐藏节点，并使用Rectified Linear Units (ReLU)作为每层之间的激活函数，用于Actor和Critics。然后，我们在Actor的输出之后添加了最后一个tanh单位。批评家同时接收状态和动作作为第一层的输入。两个网络参数都使用Adam优化器进行了更新。双延迟DDPG (TD3)背后的思想是减少深度q学习中离散行为的高估偏差，这在Actor-Critic域设置中是无效的。基于评估时间步的最大平均奖励，我们的模型实现了2364的近似最大值。因此，我们可以真正地说，TD3在连续控制域中具有挑战性的环境下，在学习速度和深度确定性策略梯度(Deep Deterministic Policy Gradient, DDPG)的性能上都有了明显的提高。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Twin-Delayed DDPG: A Deep Reinforcement Learning Technique to Model a Continuous Movement of an Intelligent Robot Agent

In this current research, Twin-Delayed DDPG (TD3) algorithm has been used to solve the most challenging virtual Artificial Intelligence application by training a 4-ant-legged robot as an Intelligent Agent to run across a field. Twin-Delayed DDPG (TD3) is an incredibly smart AI model of a Deep Reinforcement Learning which combines the state-of-the-art methods in Artificial Intelligence. These includes Policy gradient, Actor-Critics, and continuous Double Deep Q-Learning. These Deep Reinforcement Learning approaches trained an Intelligent agent to interact with an environment with automatic feature engineering, that is, necessitating minimal domain knowledge. For the implementation of the TD3, we used a two-layer feedforward neural network of 400 and 300 hidden nodes respectively, with Rectified Linear Units (ReLU) as an activation function between each layer for both the Actor and Critics. We, then added a final tanh unit after the output of the Actor. The Critic receives both the state and action as input to the first layer. Both the network parameters were updated using Adam optimizer. The idea behind the Twin-Delayed DDPG (TD3) is to reduce overestimation bias in Deep Q-Learning with discrete actions which are ineffective in an Actor-Critic domain setting. Based on the Maximum Average Reward over the evaluation time-step, our model achieved an approximate maximum of 2364. Therefore, we can truly say that, TD3 has obviously improved on both the learning speed and performance of the Deep Deterministic Policy Gradient (DDPG) in a challenging environment in a continuous control domain.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Proceedings of the 3rd International Conference on Vision, Image and Signal Processing

自引率

0.00%

发文量