机器人在目标导向行为和习惯行为之间自主转换的标准是什么?

2015 Joint IEEE International Conference on Development and Learning and Epigenetic Robotics (ICDL-EpiRob) Pub Date : 2015-12-07 DOI:10.1109/DEVLRN.2015.7346152

Erwan Renaudo, Benoît Girard, R. Chatila, M. Khamassi

{"title":"机器人在目标导向行为和习惯行为之间自主转换的标准是什么?","authors":"Erwan Renaudo, Benoît Girard, R. Chatila, M. Khamassi","doi":"10.1109/DEVLRN.2015.7346152","DOIUrl":null,"url":null,"abstract":"Research in the fields of Psychology and Neuroscience have provided strong evidence that mammals can adaptively switch between goal-directed behaviors - i.e. deliberative decisions based on costly but flexible planned long-term consequences of actions - and habitual behaviors - i.e. reactive behaviors that are efficient when the environment is stable but inflexible in the case of environmental changes. However, the computational principles underlying this switching ability are not yet understood, and several alternative criteria have been proposed, each tested on specific subsets of experimental datasets. Here we present a neurorobotic implementation and comparison of such type of criteria, plus some new ones imported from the field of ensemble reinforcement learning, with a twofold objective: on the one hand exploring the possible efficiency of such bio-inspired principles to enable robots to have more behavioral flexibility during autonomous development and learning; on the other hand, analyzing whether an asynchronous continuous robotic simulation and comparison of these criteria in a common task can feed current debates in the Psychological and Neuroscience fields. We evaluate these methods in an apparently simple repetitive cube-pushing task on a simulated conveyor belt, but which imposes to the robot constant trade-offs between speed and accuracy and between stability and abrupt changes. Our results show that if overall performance is not improved by using multiple behavioral systems in a stable environment, these methods allow for a better adaptation to environmental changes. The Voting methods and Boltzmann addition, from ensemble reinforcement learning, give the best performance, providing an interesting alternative to Expert selection.","PeriodicalId":164756,"journal":{"name":"2015 Joint IEEE International Conference on Development and Learning and Epigenetic Robotics (ICDL-EpiRob)","volume":"15 4 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2015-12-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"14","resultStr":"{\"title\":\"Which criteria for autonomously shifting between goal-directed and habitual behaviors in robots?\",\"authors\":\"Erwan Renaudo, Benoît Girard, R. Chatila, M. Khamassi\",\"doi\":\"10.1109/DEVLRN.2015.7346152\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Research in the fields of Psychology and Neuroscience have provided strong evidence that mammals can adaptively switch between goal-directed behaviors - i.e. deliberative decisions based on costly but flexible planned long-term consequences of actions - and habitual behaviors - i.e. reactive behaviors that are efficient when the environment is stable but inflexible in the case of environmental changes. However, the computational principles underlying this switching ability are not yet understood, and several alternative criteria have been proposed, each tested on specific subsets of experimental datasets. Here we present a neurorobotic implementation and comparison of such type of criteria, plus some new ones imported from the field of ensemble reinforcement learning, with a twofold objective: on the one hand exploring the possible efficiency of such bio-inspired principles to enable robots to have more behavioral flexibility during autonomous development and learning; on the other hand, analyzing whether an asynchronous continuous robotic simulation and comparison of these criteria in a common task can feed current debates in the Psychological and Neuroscience fields. We evaluate these methods in an apparently simple repetitive cube-pushing task on a simulated conveyor belt, but which imposes to the robot constant trade-offs between speed and accuracy and between stability and abrupt changes. Our results show that if overall performance is not improved by using multiple behavioral systems in a stable environment, these methods allow for a better adaptation to environmental changes. The Voting methods and Boltzmann addition, from ensemble reinforcement learning, give the best performance, providing an interesting alternative to Expert selection.\",\"PeriodicalId\":164756,\"journal\":{\"name\":\"2015 Joint IEEE International Conference on Development and Learning and Epigenetic Robotics (ICDL-EpiRob)\",\"volume\":\"15 4 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2015-12-07\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"14\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2015 Joint IEEE International Conference on Development and Learning and Epigenetic Robotics (ICDL-EpiRob)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/DEVLRN.2015.7346152\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2015 Joint IEEE International Conference on Development and Learning and Epigenetic Robotics (ICDL-EpiRob)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/DEVLRN.2015.7346152","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 14

摘要

心理学和神经科学领域的研究提供了强有力的证据，表明哺乳动物可以在目标导向的行为(即基于代价高昂但灵活的行动长期后果的审慎决策)和习惯行为(即在环境稳定时有效的反应性行为，但在环境变化的情况下缺乏灵活性)之间进行适应性转换。然而，这种切换能力背后的计算原理尚未被理解，并且已经提出了几个替代标准，每个标准都在实验数据集的特定子集上进行了测试。在这里，我们提出了这类标准的神经机器人实现和比较，以及从集成强化学习领域引入的一些新标准，具有双重目标:一方面，探索这种仿生原理的可能效率，使机器人在自主发展和学习过程中具有更多的行为灵活性;另一方面，分析在一个共同的任务中异步连续机器人模拟和这些标准的比较是否可以为当前心理学和神经科学领域的争论提供信息。我们在模拟传送带上的一个看似简单的重复立方体推动任务中评估了这些方法，但这对机器人施加了速度和精度之间以及稳定性和突变之间的不断权衡。我们的研究结果表明，如果在稳定的环境中使用多种行为系统不能提高整体性能，那么这些方法可以更好地适应环境变化。来自集成强化学习的投票方法和玻尔兹曼加法给出了最好的性能，为专家选择提供了一个有趣的选择。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Which criteria for autonomously shifting between goal-directed and habitual behaviors in robots?

Research in the fields of Psychology and Neuroscience have provided strong evidence that mammals can adaptively switch between goal-directed behaviors - i.e. deliberative decisions based on costly but flexible planned long-term consequences of actions - and habitual behaviors - i.e. reactive behaviors that are efficient when the environment is stable but inflexible in the case of environmental changes. However, the computational principles underlying this switching ability are not yet understood, and several alternative criteria have been proposed, each tested on specific subsets of experimental datasets. Here we present a neurorobotic implementation and comparison of such type of criteria, plus some new ones imported from the field of ensemble reinforcement learning, with a twofold objective: on the one hand exploring the possible efficiency of such bio-inspired principles to enable robots to have more behavioral flexibility during autonomous development and learning; on the other hand, analyzing whether an asynchronous continuous robotic simulation and comparison of these criteria in a common task can feed current debates in the Psychological and Neuroscience fields. We evaluate these methods in an apparently simple repetitive cube-pushing task on a simulated conveyor belt, but which imposes to the robot constant trade-offs between speed and accuracy and between stability and abrupt changes. Our results show that if overall performance is not improved by using multiple behavioral systems in a stable environment, these methods allow for a better adaptation to environmental changes. The Voting methods and Boltzmann addition, from ensemble reinforcement learning, give the best performance, providing an interesting alternative to Expert selection.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2015 Joint IEEE International Conference on Development and Learning and Epigenetic Robotics (ICDL-EpiRob)

自引率

0.00%

发文量