NovPhy：开放世界人工智能系统的物理推理基准

IF 5.1 2区计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Artificial Intelligence Pub Date : 2024-08-02 DOI:10.1016/j.artint.2024.104198

Vimukthini Pinto , Chathura Gamage , Cheng Xue , Peng Zhang , Ekaterina Nikonova , Matthew Stephenson , Jochen Renz

{"title":"NovPhy：开放世界人工智能系统的物理推理基准","authors":"Vimukthini Pinto , Chathura Gamage , Cheng Xue , Peng Zhang , Ekaterina Nikonova , Matthew Stephenson , Jochen Renz","doi":"10.1016/j.artint.2024.104198","DOIUrl":null,"url":null,"abstract":"<div><p>Due to the emergence of AI systems that interact with the physical environment, there is an increased interest in incorporating physical reasoning capabilities into those AI systems. But is it enough to only have physical reasoning capabilities to operate in a real physical environment? In the real world, we constantly face novel situations we have not encountered before. As humans, we are competent at successfully adapting to those situations. Similarly, an agent needs to have the ability to function under the impact of novelties in order to properly operate in an open-world physical environment. To facilitate the development of such AI systems, we propose a new benchmark, NovPhy, that requires an agent to reason about physical scenarios in the presence of novelties and take actions accordingly. The benchmark consists of tasks that require agents to detect and adapt to novelties in physical scenarios. To create tasks in the benchmark, we develop eight novelties representing a diverse novelty space and apply them to five commonly encountered scenarios in a physical environment, related to applying forces and motions such as rolling, falling, and sliding of objects. According to our benchmark design, we evaluate two capabilities of an agent: the performance on a novelty when it is applied to different physical scenarios and the performance on a physical scenario when different novelties are applied to it. We conduct a thorough evaluation with human players, learning agents, and heuristic agents. Our evaluation shows that humans' performance is far beyond the agents' performance. Some agents, even with good normal task performance, perform significantly worse when there is a novelty, and the agents that can adapt to novelties typically adapt slower than humans. We promote the development of intelligent agents capable of performing at the human level or above when operating in open-world physical environments. benchmark website: <span><span>https://github.com/phy-q/novphy</span><svg><path></path></svg></span></p></div>","PeriodicalId":8434,"journal":{"name":"Artificial Intelligence","volume":"336 ","pages":"Article 104198"},"PeriodicalIF":5.1000,"publicationDate":"2024-08-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S0004370224001346/pdfft?md5=387702c1b2d7756ba391c869b2457b2c&pid=1-s2.0-S0004370224001346-main.pdf","citationCount":"0","resultStr":"{\"title\":\"NovPhy: A Physical Reasoning Benchmark for Open-world AI Systems\",\"authors\":\"Vimukthini Pinto , Chathura Gamage , Cheng Xue , Peng Zhang , Ekaterina Nikonova , Matthew Stephenson , Jochen Renz\",\"doi\":\"10.1016/j.artint.2024.104198\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><p>Due to the emergence of AI systems that interact with the physical environment, there is an increased interest in incorporating physical reasoning capabilities into those AI systems. But is it enough to only have physical reasoning capabilities to operate in a real physical environment? In the real world, we constantly face novel situations we have not encountered before. As humans, we are competent at successfully adapting to those situations. Similarly, an agent needs to have the ability to function under the impact of novelties in order to properly operate in an open-world physical environment. To facilitate the development of such AI systems, we propose a new benchmark, NovPhy, that requires an agent to reason about physical scenarios in the presence of novelties and take actions accordingly. The benchmark consists of tasks that require agents to detect and adapt to novelties in physical scenarios. To create tasks in the benchmark, we develop eight novelties representing a diverse novelty space and apply them to five commonly encountered scenarios in a physical environment, related to applying forces and motions such as rolling, falling, and sliding of objects. According to our benchmark design, we evaluate two capabilities of an agent: the performance on a novelty when it is applied to different physical scenarios and the performance on a physical scenario when different novelties are applied to it. We conduct a thorough evaluation with human players, learning agents, and heuristic agents. Our evaluation shows that humans' performance is far beyond the agents' performance. Some agents, even with good normal task performance, perform significantly worse when there is a novelty, and the agents that can adapt to novelties typically adapt slower than humans. We promote the development of intelligent agents capable of performing at the human level or above when operating in open-world physical environments. benchmark website: <span><span>https://github.com/phy-q/novphy</span><svg><path></path></svg></span></p></div>\",\"PeriodicalId\":8434,\"journal\":{\"name\":\"Artificial Intelligence\",\"volume\":\"336 \",\"pages\":\"Article 104198\"},\"PeriodicalIF\":5.1000,\"publicationDate\":\"2024-08-02\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.sciencedirect.com/science/article/pii/S0004370224001346/pdfft?md5=387702c1b2d7756ba391c869b2457b2c&pid=1-s2.0-S0004370224001346-main.pdf\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Artificial Intelligence\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S0004370224001346\",\"RegionNum\":2,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Artificial Intelligence","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0004370224001346","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}

引用次数: 0

摘要

由于与物理环境交互的人工智能系统的出现，人们对将物理推理能力纳入这些人工智能系统的兴趣与日俱增。但是，在真实的物理环境中运行时，仅具备物理推理能力是否就足够了呢？在现实世界中，我们经常会遇到从未遇到过的新情况。作为人类，我们有能力成功适应这些情况。同样，要想在开放世界的物理环境中正常运行，代理也需要具备在新情况影响下运行的能力。为了促进这类人工智能系统的开发，我们提出了一个新的基准--NovPhy，它要求代理对存在新奇事物的物理场景进行推理，并采取相应的行动。该基准由要求代理检测和适应物理场景中新奇事物的任务组成。为了创建该基准中的任务，我们开发了代表不同新奇空间的八种新奇事物，并将它们应用到物理环境中常见的五种场景中，这些场景与施加力和物体滚动、下落和滑动等运动有关。根据我们的基准设计，我们对代理的两种能力进行了评估：将新奇事物应用于不同物理场景时的表现，以及将不同新奇事物应用于一个物理场景时的表现。我们对人类玩家、学习型代理和启发式代理进行了全面评估。我们的评估结果表明，人类的表现远远超过了代理的表现。有些代理即使在正常任务中表现出色，但在出现新情况时，其表现就会大打折扣，而能够适应新情况的代理通常适应得比人类慢。我们提倡开发在开放世界物理环境中运行时能达到或超过人类水平的智能代理。基准网站：https://github.com/phy-q/novphy

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

NovPhy: A Physical Reasoning Benchmark for Open-world AI Systems

Due to the emergence of AI systems that interact with the physical environment, there is an increased interest in incorporating physical reasoning capabilities into those AI systems. But is it enough to only have physical reasoning capabilities to operate in a real physical environment? In the real world, we constantly face novel situations we have not encountered before. As humans, we are competent at successfully adapting to those situations. Similarly, an agent needs to have the ability to function under the impact of novelties in order to properly operate in an open-world physical environment. To facilitate the development of such AI systems, we propose a new benchmark, NovPhy, that requires an agent to reason about physical scenarios in the presence of novelties and take actions accordingly. The benchmark consists of tasks that require agents to detect and adapt to novelties in physical scenarios. To create tasks in the benchmark, we develop eight novelties representing a diverse novelty space and apply them to five commonly encountered scenarios in a physical environment, related to applying forces and motions such as rolling, falling, and sliding of objects. According to our benchmark design, we evaluate two capabilities of an agent: the performance on a novelty when it is applied to different physical scenarios and the performance on a physical scenario when different novelties are applied to it. We conduct a thorough evaluation with human players, learning agents, and heuristic agents. Our evaluation shows that humans' performance is far beyond the agents' performance. Some agents, even with good normal task performance, perform significantly worse when there is a novelty, and the agents that can adapt to novelties typically adapt slower than humans. We promote the development of intelligent agents capable of performing at the human level or above when operating in open-world physical environments. benchmark website: https://github.com/phy-q/novphy

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Artificial Intelligence 工程技术-计算机：人工智能

CiteScore

11.20

自引率

1.40%

发文量

118

审稿时长

8 months

期刊介绍： The Journal of Artificial Intelligence (AIJ) welcomes papers covering a broad spectrum of AI topics, including cognition, automated reasoning, computer vision, machine learning, and more. Papers should demonstrate advancements in AI and propose innovative approaches to AI problems. Additionally, the journal accepts papers describing AI applications, focusing on how new methods enhance performance rather than reiterating conventional approaches. In addition to regular papers, AIJ also accepts Research Notes, Research Field Reviews, Position Papers, Book Reviews, and summary papers on AI challenges and competitions.