基于网格的强化学习环境中可推广和可解释知识的学习

Proceedings of the AAAI Conference on Artificial Intelligence and Interactive Digital Entertainment Pub Date : 2023-10-06 DOI:10.1609/aiide.v19i1.27516

Manuel Eberhardinger, Johannes Maucher, Setareh Maghsudi

{"title":"基于网格的强化学习环境中可推广和可解释知识的学习","authors":"Manuel Eberhardinger, Johannes Maucher, Setareh Maghsudi","doi":"10.1609/aiide.v19i1.27516","DOIUrl":null,"url":null,"abstract":"Understanding the interactions of agents trained with deep reinforcement learning is crucial for deploying agents in games or the real world. In the former, unreasonable actions confuse players. In the latter, that effect is even more significant, as unexpected behavior cause accidents with potentially grave and long-lasting consequences for the involved individuals. In this work, we propose using program synthesis to imitate reinforcement learning policies after seeing a trajectory of the action sequence. Programs have the advantage that they are inherently interpretable and verifiable for correctness. We adapt the state-of-the-art program synthesis system DreamCoder for learning concepts in grid-based environments, specifically, a navigation task and two miniature versions of Atari games, Space Invaders and Asterix. By inspecting the generated libraries, we can make inferences about the concepts the black-box agent has learned and better understand the agent's behavior. We achieve the same by visualizing the agent's decision-making process for the imitated sequences. We evaluate our approach with different types of program synthesizers based on a search-only method, a neural-guided search, and a language model fine-tuned on code.","PeriodicalId":498041,"journal":{"name":"Proceedings of the AAAI Conference on Artificial Intelligence and Interactive Digital Entertainment","volume":"169 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-10-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Learning of Generalizable and Interpretable Knowledge in Grid-Based Reinforcement Learning Environments\",\"authors\":\"Manuel Eberhardinger, Johannes Maucher, Setareh Maghsudi\",\"doi\":\"10.1609/aiide.v19i1.27516\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Understanding the interactions of agents trained with deep reinforcement learning is crucial for deploying agents in games or the real world. In the former, unreasonable actions confuse players. In the latter, that effect is even more significant, as unexpected behavior cause accidents with potentially grave and long-lasting consequences for the involved individuals. In this work, we propose using program synthesis to imitate reinforcement learning policies after seeing a trajectory of the action sequence. Programs have the advantage that they are inherently interpretable and verifiable for correctness. We adapt the state-of-the-art program synthesis system DreamCoder for learning concepts in grid-based environments, specifically, a navigation task and two miniature versions of Atari games, Space Invaders and Asterix. By inspecting the generated libraries, we can make inferences about the concepts the black-box agent has learned and better understand the agent's behavior. We achieve the same by visualizing the agent's decision-making process for the imitated sequences. We evaluate our approach with different types of program synthesizers based on a search-only method, a neural-guided search, and a language model fine-tuned on code.\",\"PeriodicalId\":498041,\"journal\":{\"name\":\"Proceedings of the AAAI Conference on Artificial Intelligence and Interactive Digital Entertainment\",\"volume\":\"169 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2023-10-06\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the AAAI Conference on Artificial Intelligence and Interactive Digital Entertainment\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1609/aiide.v19i1.27516\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the AAAI Conference on Artificial Intelligence and Interactive Digital Entertainment","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1609/aiide.v19i1.27516","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

摘要

理解经过深度强化学习训练的智能体之间的相互作用对于在游戏或现实世界中部署智能体至关重要。在前者中，不合理的行动会让玩家感到困惑。在后一种情况下，这种影响甚至更为显著，因为意外行为会导致事故，对相关个人造成潜在的严重和持久的后果。在这项工作中，我们建议在看到动作序列的轨迹后使用程序综合来模仿强化学习策略。程序的优点是它们本质上是可解释和可验证的。我们采用最先进的程序合成系统DreamCoder来学习基于网格的环境中的概念，特别是导航任务和雅达利游戏的两个微型版本，太空侵略者和Asterix。通过检查生成的库，我们可以推断黑盒代理已经学习的概念，并更好地理解代理的行为。我们通过可视化代理对模拟序列的决策过程来达到同样的目的。我们使用不同类型的程序合成器来评估我们的方法，这些合成器基于仅搜索方法、神经引导搜索和对代码进行微调的语言模型。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Learning of Generalizable and Interpretable Knowledge in Grid-Based Reinforcement Learning Environments

Understanding the interactions of agents trained with deep reinforcement learning is crucial for deploying agents in games or the real world. In the former, unreasonable actions confuse players. In the latter, that effect is even more significant, as unexpected behavior cause accidents with potentially grave and long-lasting consequences for the involved individuals. In this work, we propose using program synthesis to imitate reinforcement learning policies after seeing a trajectory of the action sequence. Programs have the advantage that they are inherently interpretable and verifiable for correctness. We adapt the state-of-the-art program synthesis system DreamCoder for learning concepts in grid-based environments, specifically, a navigation task and two miniature versions of Atari games, Space Invaders and Asterix. By inspecting the generated libraries, we can make inferences about the concepts the black-box agent has learned and better understand the agent's behavior. We achieve the same by visualizing the agent's decision-making process for the imitated sequences. We evaluate our approach with different types of program synthesizers based on a search-only method, a neural-guided search, and a language model fine-tuned on code.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Proceedings of the AAAI Conference on Artificial Intelligence and Interactive Digital Entertainment

自引率

0.00%

发文量