人形机器人中的强化学习

2006 8th Seminar on Neural Network Applications in Electrical Engineering Pub Date : 2006-09-01 DOI:10.1109/NEUREL.2006.341182

D. Katic

{"title":"人形机器人中的强化学习","authors":"D. Katic","doi":"10.1109/NEUREL.2006.341182","DOIUrl":null,"url":null,"abstract":"Summary form only given. Dynamic bipedal walking is difficult to learn because combinatorial explosion in order to optimize performance in every possible configuration of the robot, uncertainties of the robot dynamics that must be only experimentally validated, and because coping with dynamic discontinuities caused by collisions with the ground and with the problem of delayed reward-torques applied at one time may have an effect on the performance many steps into the future. The detailed and precise training data for learning is often hard to obtain or may not be available in the process of biped control synthesis. Since no exact teaching information is available, this is a typical reinforcement learning problem and the failure signal serves as the reinforcement signal. Reinforcement learning (RL) offers one of the most general framework to humanoid robotics towards true autonomy and versatility. Various straightforward and hybrid intelligent control algorithms based RL for active and passive biped locomotion is presented. The proposed reinforcement learning algorithms is based on two different learning structures: actor-critic architecture and Q-learning structures. Also, RL algorithms can use numerical and fuzzy evaluative feedback information for external reinforcement. The proposed RL algorithms use the learning elements that consist of various types of neural networks, fuzzy logic nets or fuzzy-neuro networks with focus on fast convergence properties and small number of learning trials","PeriodicalId":231606,"journal":{"name":"2006 8th Seminar on Neural Network Applications in Electrical Engineering","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2006-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Reinforcement Learning in Humanoid Robotics Dusko Katic\",\"authors\":\"D. Katic\",\"doi\":\"10.1109/NEUREL.2006.341182\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Summary form only given. Dynamic bipedal walking is difficult to learn because combinatorial explosion in order to optimize performance in every possible configuration of the robot, uncertainties of the robot dynamics that must be only experimentally validated, and because coping with dynamic discontinuities caused by collisions with the ground and with the problem of delayed reward-torques applied at one time may have an effect on the performance many steps into the future. The detailed and precise training data for learning is often hard to obtain or may not be available in the process of biped control synthesis. Since no exact teaching information is available, this is a typical reinforcement learning problem and the failure signal serves as the reinforcement signal. Reinforcement learning (RL) offers one of the most general framework to humanoid robotics towards true autonomy and versatility. Various straightforward and hybrid intelligent control algorithms based RL for active and passive biped locomotion is presented. The proposed reinforcement learning algorithms is based on two different learning structures: actor-critic architecture and Q-learning structures. Also, RL algorithms can use numerical and fuzzy evaluative feedback information for external reinforcement. The proposed RL algorithms use the learning elements that consist of various types of neural networks, fuzzy logic nets or fuzzy-neuro networks with focus on fast convergence properties and small number of learning trials\",\"PeriodicalId\":231606,\"journal\":{\"name\":\"2006 8th Seminar on Neural Network Applications in Electrical Engineering\",\"volume\":\"1 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2006-09-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2006 8th Seminar on Neural Network Applications in Electrical Engineering\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/NEUREL.2006.341182\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2006 8th Seminar on Neural Network Applications in Electrical Engineering","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/NEUREL.2006.341182","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

摘要

只提供摘要形式。动态双足行走很难学习，因为为了在机器人的每一个可能的配置中优化性能的组合爆炸，机器人动力学的不确定性必须通过实验验证，因为处理与地面碰撞引起的动态不连续性以及一次应用延迟奖励力矩的问题可能会对未来许多步骤的性能产生影响。在双足控制综合过程中，详细、精确的学习训练数据往往难以获得或无法获得。由于没有准确的教学信息，这是一个典型的强化学习问题，失效信号作为强化信号。强化学习(RL)为类人机器人实现真正的自主性和多功能性提供了最通用的框架之一。针对主动和被动两足运动，提出了各种基于强化学习的直接和混合智能控制算法。提出的强化学习算法基于两种不同的学习结构:演员-评论家结构和q -学习结构。此外，强化学习算法可以使用数值和模糊评价反馈信息进行外部强化。所提出的强化学习算法使用由各种类型的神经网络、模糊逻辑网络或模糊神经网络组成的学习元素，重点是快速收敛特性和少量的学习试验

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Reinforcement Learning in Humanoid Robotics Dusko Katic

Summary form only given. Dynamic bipedal walking is difficult to learn because combinatorial explosion in order to optimize performance in every possible configuration of the robot, uncertainties of the robot dynamics that must be only experimentally validated, and because coping with dynamic discontinuities caused by collisions with the ground and with the problem of delayed reward-torques applied at one time may have an effect on the performance many steps into the future. The detailed and precise training data for learning is often hard to obtain or may not be available in the process of biped control synthesis. Since no exact teaching information is available, this is a typical reinforcement learning problem and the failure signal serves as the reinforcement signal. Reinforcement learning (RL) offers one of the most general framework to humanoid robotics towards true autonomy and versatility. Various straightforward and hybrid intelligent control algorithms based RL for active and passive biped locomotion is presented. The proposed reinforcement learning algorithms is based on two different learning structures: actor-critic architecture and Q-learning structures. Also, RL algorithms can use numerical and fuzzy evaluative feedback information for external reinforcement. The proposed RL algorithms use the learning elements that consist of various types of neural networks, fuzzy logic nets or fuzzy-neuro networks with focus on fast convergence properties and small number of learning trials

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2006 8th Seminar on Neural Network Applications in Electrical Engineering

自引率

0.00%

发文量