在基于文本的游戏中掌握多种技能和跨环境推广的元强化学习

Zhenjie Zhao, Mingfei Sun, Xiaojuan Ma
{"title":"在基于文本的游戏中掌握多种技能和跨环境推广的元强化学习","authors":"Zhenjie Zhao, Mingfei Sun, Xiaojuan Ma","doi":"10.18653/v1/2021.metanlp-1.1","DOIUrl":null,"url":null,"abstract":"Text-based games can be used to develop task-oriented text agents for accomplishing tasks with high-level language instructions, which has potential applications in domains such as human-robot interaction. Given a text instruction, reinforcement learning is commonly used to train agents to complete the intended task owing to its convenience of learning policies automatically. However, because of the large space of combinatorial text actions, learning a policy network that generates an action word by word with reinforcement learning is challenging. Recent research works show that imitation learning provides an effective way of training a generation-based policy network. However, trained agents with imitation learning are hard to master a wide spectrum of task types or skills, and it is also difficult for them to generalize to new environments. In this paper, we propose a meta reinforcement learning based method to train text agents through learning-to-explore. In particular, the text agent first explores the environment to gather task-specific information and then adapts the execution policy for solving the task with this information. On the publicly available testbed ALFWorld, we conducted a comparison study with imitation learning and show the superiority of our method.","PeriodicalId":171906,"journal":{"name":"Proceedings of the 1st Workshop on Meta Learning and Its Applications to Natural Language Processing","volume":"44 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":"{\"title\":\"Meta-Reinforcement Learning for Mastering Multiple Skills and Generalizing across Environments in Text-based Games\",\"authors\":\"Zhenjie Zhao, Mingfei Sun, Xiaojuan Ma\",\"doi\":\"10.18653/v1/2021.metanlp-1.1\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Text-based games can be used to develop task-oriented text agents for accomplishing tasks with high-level language instructions, which has potential applications in domains such as human-robot interaction. Given a text instruction, reinforcement learning is commonly used to train agents to complete the intended task owing to its convenience of learning policies automatically. However, because of the large space of combinatorial text actions, learning a policy network that generates an action word by word with reinforcement learning is challenging. Recent research works show that imitation learning provides an effective way of training a generation-based policy network. However, trained agents with imitation learning are hard to master a wide spectrum of task types or skills, and it is also difficult for them to generalize to new environments. In this paper, we propose a meta reinforcement learning based method to train text agents through learning-to-explore. In particular, the text agent first explores the environment to gather task-specific information and then adapts the execution policy for solving the task with this information. On the publicly available testbed ALFWorld, we conducted a comparison study with imitation learning and show the superiority of our method.\",\"PeriodicalId\":171906,\"journal\":{\"name\":\"Proceedings of the 1st Workshop on Meta Learning and Its Applications to Natural Language Processing\",\"volume\":\"44 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"1900-01-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"2\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the 1st Workshop on Meta Learning and Its Applications to Natural Language Processing\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.18653/v1/2021.metanlp-1.1\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 1st Workshop on Meta Learning and Its Applications to Natural Language Processing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.18653/v1/2021.metanlp-1.1","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 2

摘要

基于文本的游戏可以用来开发面向任务的文本代理,用于完成具有高级语言指令的任务,这在人机交互等领域具有潜在的应用前景。在给定文本指令的情况下,强化学习通常用于训练智能体完成预期任务,因为它可以方便地自动学习策略。然而,由于组合文本动作的空间很大,使用强化学习来学习一个逐字生成动作的策略网络是具有挑战性的。最近的研究表明,模仿学习提供了一种训练基于代的政策网络的有效方法。然而,经过训练的具有模仿学习的智能体很难掌握广泛的任务类型或技能,并且也很难将其推广到新的环境中。在本文中,我们提出了一种基于元强化学习的方法,通过学习探索来训练文本代理。特别是,文本代理首先探索环境以收集特定于任务的信息,然后使用这些信息调整执行策略以解决任务。在公开的测试平台ALFWorld上,我们与模仿学习进行了对比研究,显示了我们的方法的优越性。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Meta-Reinforcement Learning for Mastering Multiple Skills and Generalizing across Environments in Text-based Games
Text-based games can be used to develop task-oriented text agents for accomplishing tasks with high-level language instructions, which has potential applications in domains such as human-robot interaction. Given a text instruction, reinforcement learning is commonly used to train agents to complete the intended task owing to its convenience of learning policies automatically. However, because of the large space of combinatorial text actions, learning a policy network that generates an action word by word with reinforcement learning is challenging. Recent research works show that imitation learning provides an effective way of training a generation-based policy network. However, trained agents with imitation learning are hard to master a wide spectrum of task types or skills, and it is also difficult for them to generalize to new environments. In this paper, we propose a meta reinforcement learning based method to train text agents through learning-to-explore. In particular, the text agent first explores the environment to gather task-specific information and then adapts the execution policy for solving the task with this information. On the publicly available testbed ALFWorld, we conducted a comparison study with imitation learning and show the superiority of our method.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信