KnowPC：知识驱动的程序化强化学习，实现零距离协调

arXiv - CS - Artificial Intelligence Pub Date : 2024-08-08 DOI:arxiv-2408.04336

Yin Gu, Qi Liu, Zhi Li, Kai Zhang

{"title":"KnowPC：知识驱动的程序化强化学习，实现零距离协调","authors":"Yin Gu, Qi Liu, Zhi Li, Kai Zhang","doi":"arxiv-2408.04336","DOIUrl":null,"url":null,"abstract":"Zero-shot coordination (ZSC) remains a major challenge in the cooperative AI\nfield, which aims to learn an agent to cooperate with an unseen partner in\ntraining environments or even novel environments. In recent years, a popular\nZSC solution paradigm has been deep reinforcement learning (DRL) combined with\nadvanced self-play or population-based methods to enhance the neural policy's\nability to handle unseen partners. Despite some success, these approaches\nusually rely on black-box neural networks as the policy function. However,\nneural networks typically lack interpretability and logic, making the learned\npolicies difficult for partners (e.g., humans) to understand and limiting their\ngeneralization ability. These shortcomings hinder the application of\nreinforcement learning methods in diverse cooperative scenarios.We suggest to\nrepresent the agent's policy with an interpretable program. Unlike neural\nnetworks, programs contain stable logic, but they are non-differentiable and\ndifficult to optimize.To automatically learn such programs, we introduce\nKnowledge-driven Programmatic reinforcement learning for zero-shot Coordination\n(KnowPC). We first define a foundational Domain-Specific Language (DSL),\nincluding program structures, conditional primitives, and action primitives. A\nsignificant challenge is the vast program search space, making it difficult to\nfind high-performing programs efficiently. To address this, KnowPC integrates\nan extractor and an reasoner. The extractor discovers environmental transition\nknowledge from multi-agent interaction trajectories, while the reasoner deduces\nthe preconditions of each action primitive based on the transition knowledge.","PeriodicalId":501479,"journal":{"name":"arXiv - CS - Artificial Intelligence","volume":"74 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-08-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"KnowPC: Knowledge-Driven Programmatic Reinforcement Learning for Zero-shot Coordination\",\"authors\":\"Yin Gu, Qi Liu, Zhi Li, Kai Zhang\",\"doi\":\"arxiv-2408.04336\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Zero-shot coordination (ZSC) remains a major challenge in the cooperative AI\\nfield, which aims to learn an agent to cooperate with an unseen partner in\\ntraining environments or even novel environments. In recent years, a popular\\nZSC solution paradigm has been deep reinforcement learning (DRL) combined with\\nadvanced self-play or population-based methods to enhance the neural policy's\\nability to handle unseen partners. Despite some success, these approaches\\nusually rely on black-box neural networks as the policy function. However,\\nneural networks typically lack interpretability and logic, making the learned\\npolicies difficult for partners (e.g., humans) to understand and limiting their\\ngeneralization ability. These shortcomings hinder the application of\\nreinforcement learning methods in diverse cooperative scenarios.We suggest to\\nrepresent the agent's policy with an interpretable program. Unlike neural\\nnetworks, programs contain stable logic, but they are non-differentiable and\\ndifficult to optimize.To automatically learn such programs, we introduce\\nKnowledge-driven Programmatic reinforcement learning for zero-shot Coordination\\n(KnowPC). We first define a foundational Domain-Specific Language (DSL),\\nincluding program structures, conditional primitives, and action primitives. A\\nsignificant challenge is the vast program search space, making it difficult to\\nfind high-performing programs efficiently. To address this, KnowPC integrates\\nan extractor and an reasoner. The extractor discovers environmental transition\\nknowledge from multi-agent interaction trajectories, while the reasoner deduces\\nthe preconditions of each action primitive based on the transition knowledge.\",\"PeriodicalId\":501479,\"journal\":{\"name\":\"arXiv - CS - Artificial Intelligence\",\"volume\":\"74 1\",\"pages\":\"\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2024-08-08\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"arXiv - CS - Artificial Intelligence\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/arxiv-2408.04336\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - CS - Artificial Intelligence","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2408.04336","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

摘要

零次协调（Zero-shot coordination，ZSC）仍然是合作人工智能领域的一大挑战，其目的是学习代理如何在训练环境甚至新环境中与未曾谋面的伙伴合作。近年来，一种流行的 ZSC 解决范例是将深度强化学习（DRL）与高级自我游戏或基于种群的方法相结合，以增强神经策略处理未知伙伴的能力。尽管取得了一些成功，但这些方法通常依赖黑盒神经网络作为策略函数。然而，神经网络通常缺乏可解释性和逻辑性，使得伙伴（如人类）难以理解所学习的策略，并限制了其泛化能力。我们建议用可解释的程序来表示代理的策略。与神经网络不同，程序包含稳定的逻辑，但它们是无差别的，难以优化。为了自动学习这样的程序，我们引入了知识驱动的零次协调程序强化学习（Knowledge-driven Programmatic reinforcement learning for zero-shot Coordination，KnowPC）。我们首先定义了一种基础性的特定领域语言（DSL），包括程序结构、条件基元和动作基元。巨大的程序搜索空间是一项重大挑战，它使得高效地找到高性能程序变得十分困难。为了解决这个问题，KnowPC 集成了提取器和推理器。提取器从多代理交互轨迹中发现环境转换知识，推理器则根据转换知识推导出每个动作基元的前提条件。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

KnowPC: Knowledge-Driven Programmatic Reinforcement Learning for Zero-shot Coordination

Zero-shot coordination (ZSC) remains a major challenge in the cooperative AI field, which aims to learn an agent to cooperate with an unseen partner in training environments or even novel environments. In recent years, a popular ZSC solution paradigm has been deep reinforcement learning (DRL) combined with advanced self-play or population-based methods to enhance the neural policy's ability to handle unseen partners. Despite some success, these approaches usually rely on black-box neural networks as the policy function. However, neural networks typically lack interpretability and logic, making the learned policies difficult for partners (e.g., humans) to understand and limiting their generalization ability. These shortcomings hinder the application of reinforcement learning methods in diverse cooperative scenarios.We suggest to represent the agent's policy with an interpretable program. Unlike neural networks, programs contain stable logic, but they are non-differentiable and difficult to optimize.To automatically learn such programs, we introduce Knowledge-driven Programmatic reinforcement learning for zero-shot Coordination (KnowPC). We first define a foundational Domain-Specific Language (DSL), including program structures, conditional primitives, and action primitives. A significant challenge is the vast program search space, making it difficult to find high-performing programs efficiently. To address this, KnowPC integrates an extractor and an reasoner. The extractor discovers environmental transition knowledge from multi-agent interaction trajectories, while the reasoner deduces the preconditions of each action primitive based on the transition knowledge.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

arXiv - CS - Artificial Intelligence

自引率

0.00%

发文量