{"title":"Physical scene understanding","authors":"Jiajun Wu","doi":"10.1002/aaai.12148","DOIUrl":null,"url":null,"abstract":"<p>Current AI systems still fail to match the flexibility, robustness, and generalizability of human intelligence: how even a young child can manipulate objects to achieve goals of their own invention or in cooperation, or can learn the essentials of a complex new task within minutes. We need AI with such embodied intelligence: transforming raw sensory inputs to rapidly build a rich understanding of the world for seeing, finding, and constructing things, achieving goals, and communicating with others. This problem of physical scene understanding is challenging because it requires a holistic interpretation of scenes, objects, and humans, including their geometry, physics, functionality, semantics, and modes of interaction, building upon studies across vision, learning, graphics, robotics, and AI. My research aims to address this problem by integrating bottom-up recognition models, deep networks, and inference algorithms with top-down structured graphical models, simulation engines, and probabilistic programs.</p>","PeriodicalId":7854,"journal":{"name":"Ai Magazine","volume":"45 1","pages":"156-164"},"PeriodicalIF":2.5000,"publicationDate":"2024-02-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1002/aaai.12148","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Ai Magazine","FirstCategoryId":"94","ListUrlMain":"https://onlinelibrary.wiley.com/doi/10.1002/aaai.12148","RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 0
Abstract
Current AI systems still fail to match the flexibility, robustness, and generalizability of human intelligence: how even a young child can manipulate objects to achieve goals of their own invention or in cooperation, or can learn the essentials of a complex new task within minutes. We need AI with such embodied intelligence: transforming raw sensory inputs to rapidly build a rich understanding of the world for seeing, finding, and constructing things, achieving goals, and communicating with others. This problem of physical scene understanding is challenging because it requires a holistic interpretation of scenes, objects, and humans, including their geometry, physics, functionality, semantics, and modes of interaction, building upon studies across vision, learning, graphics, robotics, and AI. My research aims to address this problem by integrating bottom-up recognition models, deep networks, and inference algorithms with top-down structured graphical models, simulation engines, and probabilistic programs.
期刊介绍:
AI Magazine publishes original articles that are reasonably self-contained and aimed at a broad spectrum of the AI community. Technical content should be kept to a minimum. In general, the magazine does not publish articles that have been published elsewhere in whole or in part. The magazine welcomes the contribution of articles on the theory and practice of AI as well as general survey articles, tutorial articles on timely topics, conference or symposia or workshop reports, and timely columns on topics of interest to AI scientists.