CookGAN: Causality Based Text-to-Image Synthesis

2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Pub Date : 2020-06-01 DOI:10.1109/cvpr42600.2020.00556

B. Zhu, C. Ngo

引用次数: 46

Abstract

This paper addresses the problem of text-to-image synthesis from a new perspective, i.e., the cause-and-effect chain in image generation. Causality is a common phenomenon in cooking. The dish appearance changes depending on the cooking actions and ingredients. The challenge of synthesis is that a generated image should depict the visual result of action-on-object. This paper presents a new network architecture, CookGAN, that mimics visual effect in causality chain, preserves fine-grained details and progressively upsamples image. Particularly, a cooking simulator sub-network is proposed to incrementally make changes to food images based on the interaction between ingredients and cooking methods over a series of steps. Experiments on Recipe1M verify that CookGAN manages to generate food images with reasonably impressive inception score. Furthermore, the images are semantically interpretable and manipulable.

查看原文本刊更多论文

基于因果关系的文本到图像合成

本文从一个新的角度，即图像生成中的因果链，来解决文本到图像的合成问题。因果关系是烹饪中常见的现象。菜肴的外观根据烹饪动作和配料的不同而变化。合成的挑战在于生成的图像应该描述动作对对象的视觉结果。本文提出了一种新的网络结构——CookGAN，它模仿了因果链中的视觉效果，保留了细粒度的细节，并逐步对图像进行了上采样。特别地，提出了一个烹饪模拟器子网络，该网络基于食材和烹饪方法之间的一系列步骤的相互作用，对食物图像进行增量更改。在Recipe1M上的实验验证了CookGAN能够生成具有相当令人印象深刻的初始分数的食物图像。此外，图像在语义上是可解释和可操作的。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)

自引率

0.00%

发文量