如何制作披萨:学习基于合成层的GAN模型

2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Pub Date : 2019-06-01 DOI:10.1109/CVPR.2019.00819

Dim P. Papadopoulos, Y. Tamaazousti, Ferda Ofli, Ingmar Weber, A. Torralba

{"title":"如何制作披萨:学习基于合成层的GAN模型","authors":"Dim P. Papadopoulos, Y. Tamaazousti, Ferda Ofli, Ingmar Weber, A. Torralba","doi":"10.1109/CVPR.2019.00819","DOIUrl":null,"url":null,"abstract":"A food recipe is an ordered set of instructions for preparing a particular dish. From a visual perspective, every instruction step can be seen as a way to change the visual appearance of the dish by adding extra objects (e.g., adding an ingredient) or changing the appearance of the existing ones (e.g., cooking the dish). In this paper, we aim to teach a machine how to make a pizza by building a generative model that mirrors this step-by-step procedure. To do so, we learn composable module operations which are able to either add or remove a particular ingredient. Each operator is designed as a Generative Adversarial Network (GAN). Given only weak image-level supervision, the operators are trained to generate a visual layer that needs to be added to or removed from the existing image. The proposed model is able to decompose an image into an ordered sequence of layers by applying sequentially in the right order the corresponding removing modules. Experimental results on synthetic and real pizza images demonstrate that our proposed model is able to: (1) segment pizza toppings in a weakly- supervised fashion, (2) remove them by revealing what is occluded underneath them (i.e., inpainting), and (3) infer the ordering of the toppings without any depth ordering supervision. Code, data, and models are available online.","PeriodicalId":6711,"journal":{"name":"2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)","volume":"25 1","pages":"7994-8003"},"PeriodicalIF":0.0000,"publicationDate":"2019-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"32","resultStr":"{\"title\":\"How to Make a Pizza: Learning a Compositional Layer-Based GAN Model\",\"authors\":\"Dim P. Papadopoulos, Y. Tamaazousti, Ferda Ofli, Ingmar Weber, A. Torralba\",\"doi\":\"10.1109/CVPR.2019.00819\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"A food recipe is an ordered set of instructions for preparing a particular dish. From a visual perspective, every instruction step can be seen as a way to change the visual appearance of the dish by adding extra objects (e.g., adding an ingredient) or changing the appearance of the existing ones (e.g., cooking the dish). In this paper, we aim to teach a machine how to make a pizza by building a generative model that mirrors this step-by-step procedure. To do so, we learn composable module operations which are able to either add or remove a particular ingredient. Each operator is designed as a Generative Adversarial Network (GAN). Given only weak image-level supervision, the operators are trained to generate a visual layer that needs to be added to or removed from the existing image. The proposed model is able to decompose an image into an ordered sequence of layers by applying sequentially in the right order the corresponding removing modules. Experimental results on synthetic and real pizza images demonstrate that our proposed model is able to: (1) segment pizza toppings in a weakly- supervised fashion, (2) remove them by revealing what is occluded underneath them (i.e., inpainting), and (3) infer the ordering of the toppings without any depth ordering supervision. Code, data, and models are available online.\",\"PeriodicalId\":6711,\"journal\":{\"name\":\"2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)\",\"volume\":\"25 1\",\"pages\":\"7994-8003\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2019-06-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"32\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/CVPR.2019.00819\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/CVPR.2019.00819","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 32

摘要

食物食谱是一套有条理的准备某道菜的说明。从视觉角度来看，每一个指令步骤都可以被看作是通过增加额外的物体(例如，添加一种成分)或改变现有物体的外观(例如，烹饪这道菜)来改变菜肴的视觉外观的一种方式。在本文中，我们的目标是通过建立一个反映这一步骤的生成模型来教机器如何制作披萨。为此，我们学习了可组合模块操作，这些操作可以添加或删除特定的成分。每个算子被设计成一个生成对抗网络(GAN)。仅在弱图像级监督的情况下，训练操作员生成需要添加或从现有图像中删除的视觉层。该模型通过按正确的顺序依次应用相应的去除模块，将图像分解成有序的层序列。在合成披萨图像和真实披萨图像上的实验结果表明，我们提出的模型能够:(1)以弱监督的方式分割披萨浇头，(2)通过揭示其下方遮挡的内容(即油漆)来去除它们，以及(3)在没有任何深度排序监督的情况下推断浇头的顺序。代码、数据和模型都可以在线获得。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

How to Make a Pizza: Learning a Compositional Layer-Based GAN Model

A food recipe is an ordered set of instructions for preparing a particular dish. From a visual perspective, every instruction step can be seen as a way to change the visual appearance of the dish by adding extra objects (e.g., adding an ingredient) or changing the appearance of the existing ones (e.g., cooking the dish). In this paper, we aim to teach a machine how to make a pizza by building a generative model that mirrors this step-by-step procedure. To do so, we learn composable module operations which are able to either add or remove a particular ingredient. Each operator is designed as a Generative Adversarial Network (GAN). Given only weak image-level supervision, the operators are trained to generate a visual layer that needs to be added to or removed from the existing image. The proposed model is able to decompose an image into an ordered sequence of layers by applying sequentially in the right order the corresponding removing modules. Experimental results on synthetic and real pizza images demonstrate that our proposed model is able to: (1) segment pizza toppings in a weakly- supervised fashion, (2) remove them by revealing what is occluded underneath them (i.e., inpainting), and (3) infer the ordering of the toppings without any depth ordering supervision. Code, data, and models are available online.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)

自引率

0.00%

发文量