Athena: Safe Autonomous Agents with Verbal Contrastive Learning

Tanmana Sadhu, Ali Pesaranghader, Yanan Chen, Dong Hoon Yi
{"title":"Athena: Safe Autonomous Agents with Verbal Contrastive Learning","authors":"Tanmana Sadhu, Ali Pesaranghader, Yanan Chen, Dong Hoon Yi","doi":"arxiv-2408.11021","DOIUrl":null,"url":null,"abstract":"Due to emergent capabilities, large language models (LLMs) have been utilized\nas language-based agents to perform a variety of tasks and make decisions with\nan increasing degree of autonomy. These autonomous agents can understand\nhigh-level instructions, interact with their environments, and execute complex\ntasks using a selection of tools available to them. As the capabilities of the\nagents expand, ensuring their safety and trustworthiness becomes more\nimperative. In this study, we introduce the Athena framework which leverages\nthe concept of verbal contrastive learning where past safe and unsafe\ntrajectories are used as in-context (contrastive) examples to guide the agent\ntowards safety while fulfilling a given task. The framework also incorporates a\ncritiquing mechanism to guide the agent to prevent risky actions at every step.\nFurthermore, due to the lack of existing benchmarks on the safety reasoning\nability of LLM-based agents, we curate a set of 80 toolkits across 8 categories\nwith 180 scenarios to provide a safety evaluation benchmark. Our experimental\nevaluation, with both closed- and open-source LLMs, indicates verbal\ncontrastive learning and interaction-level critiquing improve the safety rate\nsignificantly.","PeriodicalId":501315,"journal":{"name":"arXiv - CS - Multiagent Systems","volume":"10 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-08-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - CS - Multiagent Systems","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2408.11021","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

Due to emergent capabilities, large language models (LLMs) have been utilized as language-based agents to perform a variety of tasks and make decisions with an increasing degree of autonomy. These autonomous agents can understand high-level instructions, interact with their environments, and execute complex tasks using a selection of tools available to them. As the capabilities of the agents expand, ensuring their safety and trustworthiness becomes more imperative. In this study, we introduce the Athena framework which leverages the concept of verbal contrastive learning where past safe and unsafe trajectories are used as in-context (contrastive) examples to guide the agent towards safety while fulfilling a given task. The framework also incorporates a critiquing mechanism to guide the agent to prevent risky actions at every step. Furthermore, due to the lack of existing benchmarks on the safety reasoning ability of LLM-based agents, we curate a set of 80 toolkits across 8 categories with 180 scenarios to provide a safety evaluation benchmark. Our experimental evaluation, with both closed- and open-source LLMs, indicates verbal contrastive learning and interaction-level critiquing improve the safety rate significantly.
雅典娜:具有语言对比学习能力的安全自主机器人
由于大型语言模型(LLMs)具有新出现的能力,因此已被用作基于语言的代理来执行各种任务,并以越来越高的自主程度做出决策。这些自主代理可以理解高级指令,与环境交互,并使用可供选择的工具执行完整的任务。随着代理能力的扩展,确保其安全性和可信度变得更加重要。在本研究中,我们引入了雅典娜框架,该框架利用了言语对比学习的概念,将过去安全和不安全的轨迹作为情境(对比)示例,引导代理在完成给定任务时注意安全。此外,由于缺乏关于基于 LLM 的代理安全推理能力的现有基准,我们收集了一套横跨 8 个类别、包含 180 个场景的 80 个工具包,以提供安全评估基准。我们使用封闭式和开源 LLM 进行了实验评估,结果表明,口头对比学习和交互级批评显著提高了安全率。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信