Tanmana Sadhu, Ali Pesaranghader, Yanan Chen, Dong Hoon Yi
{"title":"雅典娜:具有语言对比学习能力的安全自主机器人","authors":"Tanmana Sadhu, Ali Pesaranghader, Yanan Chen, Dong Hoon Yi","doi":"arxiv-2408.11021","DOIUrl":null,"url":null,"abstract":"Due to emergent capabilities, large language models (LLMs) have been utilized\nas language-based agents to perform a variety of tasks and make decisions with\nan increasing degree of autonomy. These autonomous agents can understand\nhigh-level instructions, interact with their environments, and execute complex\ntasks using a selection of tools available to them. As the capabilities of the\nagents expand, ensuring their safety and trustworthiness becomes more\nimperative. In this study, we introduce the Athena framework which leverages\nthe concept of verbal contrastive learning where past safe and unsafe\ntrajectories are used as in-context (contrastive) examples to guide the agent\ntowards safety while fulfilling a given task. The framework also incorporates a\ncritiquing mechanism to guide the agent to prevent risky actions at every step.\nFurthermore, due to the lack of existing benchmarks on the safety reasoning\nability of LLM-based agents, we curate a set of 80 toolkits across 8 categories\nwith 180 scenarios to provide a safety evaluation benchmark. Our experimental\nevaluation, with both closed- and open-source LLMs, indicates verbal\ncontrastive learning and interaction-level critiquing improve the safety rate\nsignificantly.","PeriodicalId":501315,"journal":{"name":"arXiv - CS - Multiagent Systems","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2024-08-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Athena: Safe Autonomous Agents with Verbal Contrastive Learning\",\"authors\":\"Tanmana Sadhu, Ali Pesaranghader, Yanan Chen, Dong Hoon Yi\",\"doi\":\"arxiv-2408.11021\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Due to emergent capabilities, large language models (LLMs) have been utilized\\nas language-based agents to perform a variety of tasks and make decisions with\\nan increasing degree of autonomy. These autonomous agents can understand\\nhigh-level instructions, interact with their environments, and execute complex\\ntasks using a selection of tools available to them. As the capabilities of the\\nagents expand, ensuring their safety and trustworthiness becomes more\\nimperative. In this study, we introduce the Athena framework which leverages\\nthe concept of verbal contrastive learning where past safe and unsafe\\ntrajectories are used as in-context (contrastive) examples to guide the agent\\ntowards safety while fulfilling a given task. The framework also incorporates a\\ncritiquing mechanism to guide the agent to prevent risky actions at every step.\\nFurthermore, due to the lack of existing benchmarks on the safety reasoning\\nability of LLM-based agents, we curate a set of 80 toolkits across 8 categories\\nwith 180 scenarios to provide a safety evaluation benchmark. Our experimental\\nevaluation, with both closed- and open-source LLMs, indicates verbal\\ncontrastive learning and interaction-level critiquing improve the safety rate\\nsignificantly.\",\"PeriodicalId\":501315,\"journal\":{\"name\":\"arXiv - CS - Multiagent Systems\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2024-08-20\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"arXiv - CS - Multiagent Systems\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/arxiv-2408.11021\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - CS - Multiagent Systems","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2408.11021","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Athena: Safe Autonomous Agents with Verbal Contrastive Learning
Due to emergent capabilities, large language models (LLMs) have been utilized
as language-based agents to perform a variety of tasks and make decisions with
an increasing degree of autonomy. These autonomous agents can understand
high-level instructions, interact with their environments, and execute complex
tasks using a selection of tools available to them. As the capabilities of the
agents expand, ensuring their safety and trustworthiness becomes more
imperative. In this study, we introduce the Athena framework which leverages
the concept of verbal contrastive learning where past safe and unsafe
trajectories are used as in-context (contrastive) examples to guide the agent
towards safety while fulfilling a given task. The framework also incorporates a
critiquing mechanism to guide the agent to prevent risky actions at every step.
Furthermore, due to the lack of existing benchmarks on the safety reasoning
ability of LLM-based agents, we curate a set of 80 toolkits across 8 categories
with 180 scenarios to provide a safety evaluation benchmark. Our experimental
evaluation, with both closed- and open-source LLMs, indicates verbal
contrastive learning and interaction-level critiquing improve the safety rate
significantly.