基于网格的强化学习环境中可推广和可解释知识的学习

Manuel Eberhardinger, Johannes Maucher, Setareh Maghsudi
{"title":"基于网格的强化学习环境中可推广和可解释知识的学习","authors":"Manuel Eberhardinger, Johannes Maucher, Setareh Maghsudi","doi":"10.1609/aiide.v19i1.27516","DOIUrl":null,"url":null,"abstract":"Understanding the interactions of agents trained with deep reinforcement learning is crucial for deploying agents in games or the real world. In the former, unreasonable actions confuse players. In the latter, that effect is even more significant, as unexpected behavior cause accidents with potentially grave and long-lasting consequences for the involved individuals. In this work, we propose using program synthesis to imitate reinforcement learning policies after seeing a trajectory of the action sequence. Programs have the advantage that they are inherently interpretable and verifiable for correctness. We adapt the state-of-the-art program synthesis system DreamCoder for learning concepts in grid-based environments, specifically, a navigation task and two miniature versions of Atari games, Space Invaders and Asterix. By inspecting the generated libraries, we can make inferences about the concepts the black-box agent has learned and better understand the agent's behavior. We achieve the same by visualizing the agent's decision-making process for the imitated sequences. We evaluate our approach with different types of program synthesizers based on a search-only method, a neural-guided search, and a language model fine-tuned on code.","PeriodicalId":498041,"journal":{"name":"Proceedings of the AAAI Conference on Artificial Intelligence and Interactive Digital Entertainment","volume":"169 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-10-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Learning of Generalizable and Interpretable Knowledge in Grid-Based Reinforcement Learning Environments\",\"authors\":\"Manuel Eberhardinger, Johannes Maucher, Setareh Maghsudi\",\"doi\":\"10.1609/aiide.v19i1.27516\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Understanding the interactions of agents trained with deep reinforcement learning is crucial for deploying agents in games or the real world. In the former, unreasonable actions confuse players. In the latter, that effect is even more significant, as unexpected behavior cause accidents with potentially grave and long-lasting consequences for the involved individuals. In this work, we propose using program synthesis to imitate reinforcement learning policies after seeing a trajectory of the action sequence. Programs have the advantage that they are inherently interpretable and verifiable for correctness. We adapt the state-of-the-art program synthesis system DreamCoder for learning concepts in grid-based environments, specifically, a navigation task and two miniature versions of Atari games, Space Invaders and Asterix. By inspecting the generated libraries, we can make inferences about the concepts the black-box agent has learned and better understand the agent's behavior. We achieve the same by visualizing the agent's decision-making process for the imitated sequences. We evaluate our approach with different types of program synthesizers based on a search-only method, a neural-guided search, and a language model fine-tuned on code.\",\"PeriodicalId\":498041,\"journal\":{\"name\":\"Proceedings of the AAAI Conference on Artificial Intelligence and Interactive Digital Entertainment\",\"volume\":\"169 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2023-10-06\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the AAAI Conference on Artificial Intelligence and Interactive Digital Entertainment\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1609/aiide.v19i1.27516\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the AAAI Conference on Artificial Intelligence and Interactive Digital Entertainment","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1609/aiide.v19i1.27516","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

摘要

理解经过深度强化学习训练的智能体之间的相互作用对于在游戏或现实世界中部署智能体至关重要。在前者中,不合理的行动会让玩家感到困惑。在后一种情况下,这种影响甚至更为显著,因为意外行为会导致事故,对相关个人造成潜在的严重和持久的后果。在这项工作中,我们建议在看到动作序列的轨迹后使用程序综合来模仿强化学习策略。程序的优点是它们本质上是可解释和可验证的。我们采用最先进的程序合成系统DreamCoder来学习基于网格的环境中的概念,特别是导航任务和雅达利游戏的两个微型版本,太空侵略者和Asterix。通过检查生成的库,我们可以推断黑盒代理已经学习的概念,并更好地理解代理的行为。我们通过可视化代理对模拟序列的决策过程来达到同样的目的。我们使用不同类型的程序合成器来评估我们的方法,这些合成器基于仅搜索方法、神经引导搜索和对代码进行微调的语言模型。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Learning of Generalizable and Interpretable Knowledge in Grid-Based Reinforcement Learning Environments
Understanding the interactions of agents trained with deep reinforcement learning is crucial for deploying agents in games or the real world. In the former, unreasonable actions confuse players. In the latter, that effect is even more significant, as unexpected behavior cause accidents with potentially grave and long-lasting consequences for the involved individuals. In this work, we propose using program synthesis to imitate reinforcement learning policies after seeing a trajectory of the action sequence. Programs have the advantage that they are inherently interpretable and verifiable for correctness. We adapt the state-of-the-art program synthesis system DreamCoder for learning concepts in grid-based environments, specifically, a navigation task and two miniature versions of Atari games, Space Invaders and Asterix. By inspecting the generated libraries, we can make inferences about the concepts the black-box agent has learned and better understand the agent's behavior. We achieve the same by visualizing the agent's decision-making process for the imitated sequences. We evaluate our approach with different types of program synthesizers based on a search-only method, a neural-guided search, and a language model fine-tuned on code.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信