{"title":"Precise Evaluation for Continuous Action Control in Reinforcement Learning","authors":"Fengkai Ke, Daxing Zhao, Guodong Sun, Wei Feng","doi":"10.1145/3341069.3341082","DOIUrl":null,"url":null,"abstract":"With the development of deep learning, reinforcement learning also gradually into the eye, reinforcement learning has made remarkable achievements in games, go games and other fields, but most of the control problems involved in these fields or tasks are discrete action control with sufficient rewards. Continuous action control in reinforcement learning is closer to the actual control problem, and is considered as one of the main channels leading to artificial intelligence, so it is also one of the research hotspots of researchers. The traditional continuous control algorithm for reinforcement learning evaluates the network with multiple outputs of a single scalar value. In this paper, an accurate evaluation mechanism and corresponding objective function are proposed to accelerate the reinforcement learning training process. The experimental results show that the accurate evaluation of log-cosh objective function can make the robot arm grasp the task more quickly, converge and complete the training task.","PeriodicalId":411198,"journal":{"name":"Proceedings of the 2019 3rd High Performance Computing and Cluster Technologies Conference","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-06-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 2019 3rd High Performance Computing and Cluster Technologies Conference","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3341069.3341082","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
With the development of deep learning, reinforcement learning also gradually into the eye, reinforcement learning has made remarkable achievements in games, go games and other fields, but most of the control problems involved in these fields or tasks are discrete action control with sufficient rewards. Continuous action control in reinforcement learning is closer to the actual control problem, and is considered as one of the main channels leading to artificial intelligence, so it is also one of the research hotspots of researchers. The traditional continuous control algorithm for reinforcement learning evaluates the network with multiple outputs of a single scalar value. In this paper, an accurate evaluation mechanism and corresponding objective function are proposed to accelerate the reinforcement learning training process. The experimental results show that the accurate evaluation of log-cosh objective function can make the robot arm grasp the task more quickly, converge and complete the training task.