Toon Van de Maele;Tim Verbelen;Pietro Mazzaglia;Stefano Ferraro;Bart Dhoedt
{"title":"Object-Centric Scene Representations Using Active Inference","authors":"Toon Van de Maele;Tim Verbelen;Pietro Mazzaglia;Stefano Ferraro;Bart Dhoedt","doi":"10.1162/neco_a_01637","DOIUrl":null,"url":null,"abstract":"Representing a scene and its constituent objects from raw sensory data is a core ability for enabling robots to interact with their environment. In this letter, we propose a novel approach for scene understanding, leveraging an object-centric generative model that enables an agent to infer object category and pose in an allocentric reference frame using active inference, a neuro-inspired framework for action and perception. For evaluating the behavior of an active vision agent, we also propose a new benchmark where, given a target viewpoint of a particular object, the agent needs to find the best matching viewpoint given a workspace with randomly positioned objects in 3D. We demonstrate that our active inference agent is able to balance epistemic foraging and goal-driven behavior, and quantitatively outperforms both supervised and reinforcement learning baselines by more than a factor of two in terms of success rate.","PeriodicalId":54731,"journal":{"name":"Neural Computation","volume":"36 4","pages":"677-704"},"PeriodicalIF":2.7000,"publicationDate":"2024-03-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Neural Computation","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/10535076/","RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 0
Abstract
Representing a scene and its constituent objects from raw sensory data is a core ability for enabling robots to interact with their environment. In this letter, we propose a novel approach for scene understanding, leveraging an object-centric generative model that enables an agent to infer object category and pose in an allocentric reference frame using active inference, a neuro-inspired framework for action and perception. For evaluating the behavior of an active vision agent, we also propose a new benchmark where, given a target viewpoint of a particular object, the agent needs to find the best matching viewpoint given a workspace with randomly positioned objects in 3D. We demonstrate that our active inference agent is able to balance epistemic foraging and goal-driven behavior, and quantitatively outperforms both supervised and reinforcement learning baselines by more than a factor of two in terms of success rate.
期刊介绍:
Neural Computation is uniquely positioned at the crossroads between neuroscience and TMCS and welcomes the submission of original papers from all areas of TMCS, including: Advanced experimental design; Analysis of chemical sensor data; Connectomic reconstructions; Analysis of multielectrode and optical recordings; Genetic data for cell identity; Analysis of behavioral data; Multiscale models; Analysis of molecular mechanisms; Neuroinformatics; Analysis of brain imaging data; Neuromorphic engineering; Principles of neural coding, computation, circuit dynamics, and plasticity; Theories of brain function.