{"title":"Enhancing Twin Delayed Deep Deterministic Policy Gradient with Cross-Entropy Method","authors":"Hieu Trung Nguyen, Khang Tran, N. H. Luong","doi":"10.1109/NICS54270.2021.9701549","DOIUrl":null,"url":null,"abstract":"Hybridizations of Deep Reinforcement Learning (DRL) and Evolution Computation (EC) methods have recently showed considerable successes in a variety of high dimensional physical control tasks. These hybrid frameworks offer more robust mechanisms of exploration and exploitation in the policy network parameter search space when stabilizing gradient-based updates of DRL algorithms with population-based operations adopted from EC methods. In this paper, we propose a novel hybrid framework that effectively combines the efficiency of DRL updates and the stability of EC populations. We experiment with integrating the Twin Delayed Deep Deterministic Policy Gradient (TD3) and the Cross-Entropy Method (CEM). The resulting EC-enhanced TD3 algorithm (eTD3) are compared with the baseline algorithm TD3 and a state-of-the-art evolutionary reinforcement learning (ERL) method, CEM-TD3. Experimental results on five MuJoCo continuous control benchmark environments confirm the efficacy of our approach. The source code of the paper is available at https://github.com/ELO-Lab/eTD3.","PeriodicalId":296963,"journal":{"name":"2021 8th NAFOSTED Conference on Information and Computer Science (NICS)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-12-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 8th NAFOSTED Conference on Information and Computer Science (NICS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/NICS54270.2021.9701549","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 3
Abstract
Hybridizations of Deep Reinforcement Learning (DRL) and Evolution Computation (EC) methods have recently showed considerable successes in a variety of high dimensional physical control tasks. These hybrid frameworks offer more robust mechanisms of exploration and exploitation in the policy network parameter search space when stabilizing gradient-based updates of DRL algorithms with population-based operations adopted from EC methods. In this paper, we propose a novel hybrid framework that effectively combines the efficiency of DRL updates and the stability of EC populations. We experiment with integrating the Twin Delayed Deep Deterministic Policy Gradient (TD3) and the Cross-Entropy Method (CEM). The resulting EC-enhanced TD3 algorithm (eTD3) are compared with the baseline algorithm TD3 and a state-of-the-art evolutionary reinforcement learning (ERL) method, CEM-TD3. Experimental results on five MuJoCo continuous control benchmark environments confirm the efficacy of our approach. The source code of the paper is available at https://github.com/ELO-Lab/eTD3.