Toward Improving the Generation Quality of Autoregressive Slot VAEs

IF 2.7 4区计算机科学 Q3 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Neural Computation Pub Date : 2024-04-23 DOI:10.1162/neco_a_01635

Patrick Emami;Pan He;Sanjay Ranka;Anand Rangarajan

{"title":"Toward Improving the Generation Quality of Autoregressive Slot VAEs","authors":"Patrick Emami;Pan He;Sanjay Ranka;Anand Rangarajan","doi":"10.1162/neco_a_01635","DOIUrl":null,"url":null,"abstract":"Unconditional scene inference and generation are challenging to learn jointly with a single compositional model. Despite encouraging progress on models that extract object-centric representations (“slots”) from images, unconditional generation of scenes from slots has received less attention. This is primarily because learning the multiobject relations necessary to imagine coherent scenes is difficult. We hypothesize that most existing slot-based models have a limited ability to learn object correlations. We propose two improvements that strengthen object correlation learning. The first is to condition the slots on a global, scene-level variable that captures higher-order correlations between slots. Second, we address the fundamental lack of a canonical order for objects in images by proposing to learn a consistent order to use for the autoregressive generation of scene objects. Specifically, we train an autoregressive slot prior to sequentially generate scene objects following a learned order. Ordered slot inference entails first estimating a randomly ordered set of slots using existing approaches for extracting slots from images, then aligning those slots to ordered slots generated autoregressively with the slot prior. Our experiments across three multiobject environments demonstrate clear gains in unconditional scene generation quality. Detailed ablation studies are also provided that validate the two proposed improvements.","PeriodicalId":54731,"journal":{"name":"Neural Computation","volume":"36 5","pages":"858-896"},"PeriodicalIF":2.7000,"publicationDate":"2024-04-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Neural Computation","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/10535075/","RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}

引用次数: 0

Abstract

Unconditional scene inference and generation are challenging to learn jointly with a single compositional model. Despite encouraging progress on models that extract object-centric representations (“slots”) from images, unconditional generation of scenes from slots has received less attention. This is primarily because learning the multiobject relations necessary to imagine coherent scenes is difficult. We hypothesize that most existing slot-based models have a limited ability to learn object correlations. We propose two improvements that strengthen object correlation learning. The first is to condition the slots on a global, scene-level variable that captures higher-order correlations between slots. Second, we address the fundamental lack of a canonical order for objects in images by proposing to learn a consistent order to use for the autoregressive generation of scene objects. Specifically, we train an autoregressive slot prior to sequentially generate scene objects following a learned order. Ordered slot inference entails first estimating a randomly ordered set of slots using existing approaches for extracting slots from images, then aligning those slots to ordered slots generated autoregressively with the slot prior. Our experiments across three multiobject environments demonstrate clear gains in unconditional scene generation quality. Detailed ablation studies are also provided that validate the two proposed improvements.

查看原文本刊更多论文

努力提高自回归槽式 VAE 的生成质量

无条件场景推理和生成对使用单一合成模型进行联合学习具有挑战性。尽管在从图像中提取以物体为中心的表征（"槽"）的模型方面取得了令人鼓舞的进展，但从 "槽 "中无条件生成场景的研究却较少受到关注。这主要是因为学习想象连贯场景所需的多物体关系非常困难。我们假设，大多数现有的基于插槽的模型学习物体相关性的能力有限。我们提出了两个改进方案来加强物体相关性学习。首先，将捕捉槽间高阶相关性的全局场景级变量作为槽的条件。其次，我们针对图像中物体缺乏典型顺序这一根本问题，提出了学习一致的顺序，用于场景物体的自回归生成。具体来说，我们先训练一个自回归插槽，然后按照学习到的顺序依次生成场景对象。有序插槽推理首先需要使用现有的从图像中提取插槽的方法来估计一组随机有序的插槽，然后将这些插槽与使用插槽先验自回归生成的有序插槽对齐。我们在三个多目标环境中进行的实验表明，无条件场景生成质量明显提高。我们还提供了详细的消融研究，验证了这两项改进建议。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Neural Computation 工程技术-计算机：人工智能

CiteScore

6.30

自引率

3.40%

发文量

审稿时长

3.0 months

期刊介绍： Neural Computation is uniquely positioned at the crossroads between neuroscience and TMCS and welcomes the submission of original papers from all areas of TMCS, including: Advanced experimental design; Analysis of chemical sensor data; Connectomic reconstructions; Analysis of multielectrode and optical recordings; Genetic data for cell identity; Analysis of behavioral data; Multiscale models; Analysis of molecular mechanisms; Neuroinformatics; Analysis of brain imaging data; Neuromorphic engineering; Principles of neural coding, computation, circuit dynamics, and plasticity; Theories of brain function.