好奇心创造政策搜索的多样性

ACM Transactions on Evolutionary Learning Pub Date : 2022-12-07 DOI:10.1145/3605782

Paul-Antoine Le Tolguenec, E. Rachelson, Y. Besse, Dennis G. Wilson

{"title":"好奇心创造政策搜索的多样性","authors":"Paul-Antoine Le Tolguenec, E. Rachelson, Y. Besse, Dennis G. Wilson","doi":"10.1145/3605782","DOIUrl":null,"url":null,"abstract":"When searching for policies, reward-sparse environments often lack sufficient information about which behaviors to improve upon or avoid. In such environments, the policy search process is bound to blindly search for reward-yielding transitions and no early reward can bias this search in one direction or another. A way to overcome this is to use intrinsic motivation in order to explore new transitions until a reward is found. In this work, we use a recently proposed definition of intrinsic motivation, Curiosity, in an evolutionary policy search method. We propose Curiosity-ES,1 an evolutionary strategy adapted to use Curiosity as a fitness metric. We compare Curiosity-ES with other evolutionary algorithms intended for exploration, as well as with Curiosity-based reinforcement learning, and find that Curiosity-ES can generate higher diversity without the need for an explicit diversity criterion and leads to more policies which find reward.","PeriodicalId":220659,"journal":{"name":"ACM Transactions on Evolutionary Learning","volume":"42 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-12-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Curiosity Creates Diversity in Policy Search\",\"authors\":\"Paul-Antoine Le Tolguenec, E. Rachelson, Y. Besse, Dennis G. Wilson\",\"doi\":\"10.1145/3605782\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"When searching for policies, reward-sparse environments often lack sufficient information about which behaviors to improve upon or avoid. In such environments, the policy search process is bound to blindly search for reward-yielding transitions and no early reward can bias this search in one direction or another. A way to overcome this is to use intrinsic motivation in order to explore new transitions until a reward is found. In this work, we use a recently proposed definition of intrinsic motivation, Curiosity, in an evolutionary policy search method. We propose Curiosity-ES,1 an evolutionary strategy adapted to use Curiosity as a fitness metric. We compare Curiosity-ES with other evolutionary algorithms intended for exploration, as well as with Curiosity-based reinforcement learning, and find that Curiosity-ES can generate higher diversity without the need for an explicit diversity criterion and leads to more policies which find reward.\",\"PeriodicalId\":220659,\"journal\":{\"name\":\"ACM Transactions on Evolutionary Learning\",\"volume\":\"42 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2022-12-07\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"ACM Transactions on Evolutionary Learning\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/3605782\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"ACM Transactions on Evolutionary Learning","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3605782","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

摘要

在搜索策略时，奖励稀疏的环境通常缺乏足够的信息来了解需要改进或避免哪些行为。在这样的环境中，策略搜索过程必然会盲目地寻找产生回报的过渡，并且没有早期的奖励可以使这种搜索偏向一个方向或另一个方向。克服这个问题的一个方法是使用内在动机去探索新的过渡，直到找到奖励。在这项工作中，我们在进化策略搜索方法中使用了最近提出的内在动机的定义，好奇心。我们提出了Curiosity- es,1这是一种进化策略，适合使用好奇心作为适应度度量。我们将Curiosity-ES与其他用于探索的进化算法以及基于好奇心的强化学习进行了比较，发现Curiosity-ES可以在不需要明确的多样性标准的情况下产生更高的多样性，并导致更多找到奖励的策略。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Curiosity Creates Diversity in Policy Search

When searching for policies, reward-sparse environments often lack sufficient information about which behaviors to improve upon or avoid. In such environments, the policy search process is bound to blindly search for reward-yielding transitions and no early reward can bias this search in one direction or another. A way to overcome this is to use intrinsic motivation in order to explore new transitions until a reward is found. In this work, we use a recently proposed definition of intrinsic motivation, Curiosity, in an evolutionary policy search method. We propose Curiosity-ES,1 an evolutionary strategy adapted to use Curiosity as a fitness metric. We compare Curiosity-ES with other evolutionary algorithms intended for exploration, as well as with Curiosity-based reinforcement learning, and find that Curiosity-ES can generate higher diversity without the need for an explicit diversity criterion and leads to more policies which find reward.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

ACM Transactions on Evolutionary Learning

自引率

0.00%

发文量