{"title":"人形机器人中的强化学习","authors":"D. Katic","doi":"10.1109/NEUREL.2006.341182","DOIUrl":null,"url":null,"abstract":"Summary form only given. Dynamic bipedal walking is difficult to learn because combinatorial explosion in order to optimize performance in every possible configuration of the robot, uncertainties of the robot dynamics that must be only experimentally validated, and because coping with dynamic discontinuities caused by collisions with the ground and with the problem of delayed reward-torques applied at one time may have an effect on the performance many steps into the future. The detailed and precise training data for learning is often hard to obtain or may not be available in the process of biped control synthesis. Since no exact teaching information is available, this is a typical reinforcement learning problem and the failure signal serves as the reinforcement signal. Reinforcement learning (RL) offers one of the most general framework to humanoid robotics towards true autonomy and versatility. Various straightforward and hybrid intelligent control algorithms based RL for active and passive biped locomotion is presented. The proposed reinforcement learning algorithms is based on two different learning structures: actor-critic architecture and Q-learning structures. Also, RL algorithms can use numerical and fuzzy evaluative feedback information for external reinforcement. The proposed RL algorithms use the learning elements that consist of various types of neural networks, fuzzy logic nets or fuzzy-neuro networks with focus on fast convergence properties and small number of learning trials","PeriodicalId":231606,"journal":{"name":"2006 8th Seminar on Neural Network Applications in Electrical Engineering","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2006-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Reinforcement Learning in Humanoid Robotics Dusko Katic\",\"authors\":\"D. Katic\",\"doi\":\"10.1109/NEUREL.2006.341182\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Summary form only given. Dynamic bipedal walking is difficult to learn because combinatorial explosion in order to optimize performance in every possible configuration of the robot, uncertainties of the robot dynamics that must be only experimentally validated, and because coping with dynamic discontinuities caused by collisions with the ground and with the problem of delayed reward-torques applied at one time may have an effect on the performance many steps into the future. The detailed and precise training data for learning is often hard to obtain or may not be available in the process of biped control synthesis. Since no exact teaching information is available, this is a typical reinforcement learning problem and the failure signal serves as the reinforcement signal. Reinforcement learning (RL) offers one of the most general framework to humanoid robotics towards true autonomy and versatility. Various straightforward and hybrid intelligent control algorithms based RL for active and passive biped locomotion is presented. The proposed reinforcement learning algorithms is based on two different learning structures: actor-critic architecture and Q-learning structures. Also, RL algorithms can use numerical and fuzzy evaluative feedback information for external reinforcement. The proposed RL algorithms use the learning elements that consist of various types of neural networks, fuzzy logic nets or fuzzy-neuro networks with focus on fast convergence properties and small number of learning trials\",\"PeriodicalId\":231606,\"journal\":{\"name\":\"2006 8th Seminar on Neural Network Applications in Electrical Engineering\",\"volume\":\"1 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2006-09-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2006 8th Seminar on Neural Network Applications in Electrical Engineering\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/NEUREL.2006.341182\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2006 8th Seminar on Neural Network Applications in Electrical Engineering","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/NEUREL.2006.341182","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Reinforcement Learning in Humanoid Robotics Dusko Katic
Summary form only given. Dynamic bipedal walking is difficult to learn because combinatorial explosion in order to optimize performance in every possible configuration of the robot, uncertainties of the robot dynamics that must be only experimentally validated, and because coping with dynamic discontinuities caused by collisions with the ground and with the problem of delayed reward-torques applied at one time may have an effect on the performance many steps into the future. The detailed and precise training data for learning is often hard to obtain or may not be available in the process of biped control synthesis. Since no exact teaching information is available, this is a typical reinforcement learning problem and the failure signal serves as the reinforcement signal. Reinforcement learning (RL) offers one of the most general framework to humanoid robotics towards true autonomy and versatility. Various straightforward and hybrid intelligent control algorithms based RL for active and passive biped locomotion is presented. The proposed reinforcement learning algorithms is based on two different learning structures: actor-critic architecture and Q-learning structures. Also, RL algorithms can use numerical and fuzzy evaluative feedback information for external reinforcement. The proposed RL algorithms use the learning elements that consist of various types of neural networks, fuzzy logic nets or fuzzy-neuro networks with focus on fast convergence properties and small number of learning trials