雅典娜：具有语言对比学习能力的安全自主机器人

arXiv - CS - Multiagent Systems Pub Date : 2024-08-20 DOI:arxiv-2408.11021

Tanmana Sadhu, Ali Pesaranghader, Yanan Chen, Dong Hoon Yi

{"title":"雅典娜：具有语言对比学习能力的安全自主机器人","authors":"Tanmana Sadhu, Ali Pesaranghader, Yanan Chen, Dong Hoon Yi","doi":"arxiv-2408.11021","DOIUrl":null,"url":null,"abstract":"Due to emergent capabilities, large language models (LLMs) have been utilized\nas language-based agents to perform a variety of tasks and make decisions with\nan increasing degree of autonomy. These autonomous agents can understand\nhigh-level instructions, interact with their environments, and execute complex\ntasks using a selection of tools available to them. As the capabilities of the\nagents expand, ensuring their safety and trustworthiness becomes more\nimperative. In this study, we introduce the Athena framework which leverages\nthe concept of verbal contrastive learning where past safe and unsafe\ntrajectories are used as in-context (contrastive) examples to guide the agent\ntowards safety while fulfilling a given task. The framework also incorporates a\ncritiquing mechanism to guide the agent to prevent risky actions at every step.\nFurthermore, due to the lack of existing benchmarks on the safety reasoning\nability of LLM-based agents, we curate a set of 80 toolkits across 8 categories\nwith 180 scenarios to provide a safety evaluation benchmark. Our experimental\nevaluation, with both closed- and open-source LLMs, indicates verbal\ncontrastive learning and interaction-level critiquing improve the safety rate\nsignificantly.","PeriodicalId":501315,"journal":{"name":"arXiv - CS - Multiagent Systems","volume":"10 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-08-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Athena: Safe Autonomous Agents with Verbal Contrastive Learning\",\"authors\":\"Tanmana Sadhu, Ali Pesaranghader, Yanan Chen, Dong Hoon Yi\",\"doi\":\"arxiv-2408.11021\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Due to emergent capabilities, large language models (LLMs) have been utilized\\nas language-based agents to perform a variety of tasks and make decisions with\\nan increasing degree of autonomy. These autonomous agents can understand\\nhigh-level instructions, interact with their environments, and execute complex\\ntasks using a selection of tools available to them. As the capabilities of the\\nagents expand, ensuring their safety and trustworthiness becomes more\\nimperative. In this study, we introduce the Athena framework which leverages\\nthe concept of verbal contrastive learning where past safe and unsafe\\ntrajectories are used as in-context (contrastive) examples to guide the agent\\ntowards safety while fulfilling a given task. The framework also incorporates a\\ncritiquing mechanism to guide the agent to prevent risky actions at every step.\\nFurthermore, due to the lack of existing benchmarks on the safety reasoning\\nability of LLM-based agents, we curate a set of 80 toolkits across 8 categories\\nwith 180 scenarios to provide a safety evaluation benchmark. Our experimental\\nevaluation, with both closed- and open-source LLMs, indicates verbal\\ncontrastive learning and interaction-level critiquing improve the safety rate\\nsignificantly.\",\"PeriodicalId\":501315,\"journal\":{\"name\":\"arXiv - CS - Multiagent Systems\",\"volume\":\"10 1\",\"pages\":\"\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2024-08-20\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"arXiv - CS - Multiagent Systems\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/arxiv-2408.11021\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - CS - Multiagent Systems","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2408.11021","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

摘要

由于大型语言模型（LLMs）具有新出现的能力，因此已被用作基于语言的代理来执行各种任务，并以越来越高的自主程度做出决策。这些自主代理可以理解高级指令，与环境交互，并使用可供选择的工具执行完整的任务。随着代理能力的扩展，确保其安全性和可信度变得更加重要。在本研究中，我们引入了雅典娜框架，该框架利用了言语对比学习的概念，将过去安全和不安全的轨迹作为情境（对比）示例，引导代理在完成给定任务时注意安全。此外，由于缺乏关于基于 LLM 的代理安全推理能力的现有基准，我们收集了一套横跨 8 个类别、包含 180 个场景的 80 个工具包，以提供安全评估基准。我们使用封闭式和开源 LLM 进行了实验评估，结果表明，口头对比学习和交互级批评显著提高了安全率。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Athena: Safe Autonomous Agents with Verbal Contrastive Learning

Due to emergent capabilities, large language models (LLMs) have been utilized as language-based agents to perform a variety of tasks and make decisions with an increasing degree of autonomy. These autonomous agents can understand high-level instructions, interact with their environments, and execute complex tasks using a selection of tools available to them. As the capabilities of the agents expand, ensuring their safety and trustworthiness becomes more imperative. In this study, we introduce the Athena framework which leverages the concept of verbal contrastive learning where past safe and unsafe trajectories are used as in-context (contrastive) examples to guide the agent towards safety while fulfilling a given task. The framework also incorporates a critiquing mechanism to guide the agent to prevent risky actions at every step. Furthermore, due to the lack of existing benchmarks on the safety reasoning ability of LLM-based agents, we curate a set of 80 toolkits across 8 categories with 180 scenarios to provide a safety evaluation benchmark. Our experimental evaluation, with both closed- and open-source LLMs, indicates verbal contrastive learning and interaction-level critiquing improve the safety rate significantly.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

arXiv - CS - Multiagent Systems

自引率

0.00%

发文量