{"title":"RiskAwareBench:为基于 LLM 的嵌入式代理的高级别规划评估物理风险意识","authors":"Zihao Zhu, Bingzhe Wu, Zhengyou Zhang, Baoyuan Wu","doi":"arxiv-2408.04449","DOIUrl":null,"url":null,"abstract":"The integration of large language models (LLMs) into robotics significantly\nenhances the capabilities of embodied agents in understanding and executing\ncomplex natural language instructions. However, the unmitigated deployment of\nLLM-based embodied systems in real-world environments may pose potential\nphysical risks, such as property damage and personal injury. Existing security\nbenchmarks for LLMs overlook risk awareness for LLM-based embodied agents. To\naddress this gap, we propose RiskAwareBench, an automated framework designed to\nassess physical risks awareness in LLM-based embodied agents. RiskAwareBench\nconsists of four modules: safety tips generation, risky scene generation, plan\ngeneration, and evaluation, enabling comprehensive risk assessment with minimal\nmanual intervention. Utilizing this framework, we compile the PhysicalRisk\ndataset, encompassing diverse scenarios with associated safety tips,\nobservations, and instructions. Extensive experiments reveal that most LLMs\nexhibit insufficient physical risk awareness, and baseline risk mitigation\nstrategies yield limited enhancement, which emphasizes the urgency and\ncruciality of improving risk awareness in LLM-based embodied agents in the\nfuture.","PeriodicalId":501479,"journal":{"name":"arXiv - CS - Artificial Intelligence","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2024-08-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"RiskAwareBench: Towards Evaluating Physical Risk Awareness for High-level Planning of LLM-based Embodied Agents\",\"authors\":\"Zihao Zhu, Bingzhe Wu, Zhengyou Zhang, Baoyuan Wu\",\"doi\":\"arxiv-2408.04449\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The integration of large language models (LLMs) into robotics significantly\\nenhances the capabilities of embodied agents in understanding and executing\\ncomplex natural language instructions. However, the unmitigated deployment of\\nLLM-based embodied systems in real-world environments may pose potential\\nphysical risks, such as property damage and personal injury. Existing security\\nbenchmarks for LLMs overlook risk awareness for LLM-based embodied agents. To\\naddress this gap, we propose RiskAwareBench, an automated framework designed to\\nassess physical risks awareness in LLM-based embodied agents. RiskAwareBench\\nconsists of four modules: safety tips generation, risky scene generation, plan\\ngeneration, and evaluation, enabling comprehensive risk assessment with minimal\\nmanual intervention. Utilizing this framework, we compile the PhysicalRisk\\ndataset, encompassing diverse scenarios with associated safety tips,\\nobservations, and instructions. Extensive experiments reveal that most LLMs\\nexhibit insufficient physical risk awareness, and baseline risk mitigation\\nstrategies yield limited enhancement, which emphasizes the urgency and\\ncruciality of improving risk awareness in LLM-based embodied agents in the\\nfuture.\",\"PeriodicalId\":501479,\"journal\":{\"name\":\"arXiv - CS - Artificial Intelligence\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2024-08-08\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"arXiv - CS - Artificial Intelligence\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/arxiv-2408.04449\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - CS - Artificial Intelligence","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2408.04449","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
RiskAwareBench: Towards Evaluating Physical Risk Awareness for High-level Planning of LLM-based Embodied Agents
The integration of large language models (LLMs) into robotics significantly
enhances the capabilities of embodied agents in understanding and executing
complex natural language instructions. However, the unmitigated deployment of
LLM-based embodied systems in real-world environments may pose potential
physical risks, such as property damage and personal injury. Existing security
benchmarks for LLMs overlook risk awareness for LLM-based embodied agents. To
address this gap, we propose RiskAwareBench, an automated framework designed to
assess physical risks awareness in LLM-based embodied agents. RiskAwareBench
consists of four modules: safety tips generation, risky scene generation, plan
generation, and evaluation, enabling comprehensive risk assessment with minimal
manual intervention. Utilizing this framework, we compile the PhysicalRisk
dataset, encompassing diverse scenarios with associated safety tips,
observations, and instructions. Extensive experiments reveal that most LLMs
exhibit insufficient physical risk awareness, and baseline risk mitigation
strategies yield limited enhancement, which emphasizes the urgency and
cruciality of improving risk awareness in LLM-based embodied agents in the
future.