Controllable multi-domain semantic artwork synthesis

IF 18.3 3区计算机科学 Q1 COMPUTER SCIENCE, SOFTWARE ENGINEERING

Computational Visual Media Pub Date : 2024-01-03 DOI:10.1007/s41095-023-0356-2

Yuantian Huang, Satoshi Iizuka, Edgar Simo-Serra, Kazuhiro Fukui

{"title":"Controllable multi-domain semantic artwork synthesis","authors":"Yuantian Huang, Satoshi Iizuka, Edgar Simo-Serra, Kazuhiro Fukui","doi":"10.1007/s41095-023-0356-2","DOIUrl":null,"url":null,"abstract":"<p>We present a novel framework for the multi-domain synthesis of artworks from semantic layouts. One of the main limitations of this challenging task is the lack of publicly available segmentation datasets for art synthesis. To address this problem, we propose a dataset called <i>ArtSem</i> that contains 40,000 images of artwork from four different domains, with their corresponding semantic label maps. We first extracted semantic maps from landscape photography and used a conditional generative adversarial network (GAN)-based approach for generating high-quality artwork from semantic maps without requiring paired training data. Furthermore, we propose an artwork-synthesis model using domain-dependent variational encoders for high-quality multi-domain synthesis. Subsequently, the model was improved and complemented with a simple but effective normalization method based on jointly normalizing semantics and style, which we call spatially style-adaptive normalization (SSTAN). Compared to the previous methods, which only take semantic layout as the input, our model jointly learns style and semantic information representation, improving the generation quality of artistic images. These results indicate that our model learned to separate the domains in the latent space. Thus, we can perform fine-grained control of the synthesized artwork by identifying hyperplanes that separate the different domains. Moreover, by combining the proposed dataset and approach, we generated user-controllable artworks of higher quality than that of existing approaches, as corroborated by quantitative metrics and a user study.\n</p>","PeriodicalId":37301,"journal":{"name":"Computational Visual Media","volume":"21 1","pages":""},"PeriodicalIF":18.3000,"publicationDate":"2024-01-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Computational Visual Media","FirstCategoryId":"94","ListUrlMain":"https://doi.org/10.1007/s41095-023-0356-2","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, SOFTWARE ENGINEERING","Score":null,"Total":0}

引用次数: 0

Abstract

We present a novel framework for the multi-domain synthesis of artworks from semantic layouts. One of the main limitations of this challenging task is the lack of publicly available segmentation datasets for art synthesis. To address this problem, we propose a dataset called ArtSem that contains 40,000 images of artwork from four different domains, with their corresponding semantic label maps. We first extracted semantic maps from landscape photography and used a conditional generative adversarial network (GAN)-based approach for generating high-quality artwork from semantic maps without requiring paired training data. Furthermore, we propose an artwork-synthesis model using domain-dependent variational encoders for high-quality multi-domain synthesis. Subsequently, the model was improved and complemented with a simple but effective normalization method based on jointly normalizing semantics and style, which we call spatially style-adaptive normalization (SSTAN). Compared to the previous methods, which only take semantic layout as the input, our model jointly learns style and semantic information representation, improving the generation quality of artistic images. These results indicate that our model learned to separate the domains in the latent space. Thus, we can perform fine-grained control of the synthesized artwork by identifying hyperplanes that separate the different domains. Moreover, by combining the proposed dataset and approach, we generated user-controllable artworks of higher quality than that of existing approaches, as corroborated by quantitative metrics and a user study.

Abstract Image

查看原文本刊更多论文

可控多领域语义艺术品合成

我们提出了一个从语义布局多领域合成艺术作品的新框架。这项具有挑战性的任务的主要局限之一是缺乏用于艺术合成的公开可用的分割数据集。为了解决这个问题，我们提出了一个名为 ArtSem 的数据集，其中包含来自四个不同领域的 40,000 张艺术作品图像及其相应的语义标签图。我们首先从风景摄影中提取语义图，然后使用基于条件生成式对抗网络（GAN）的方法从语义图生成高质量的艺术作品，而无需配对训练数据。此外，我们还提出了一种艺术作品合成模型，该模型使用依赖于领域的变异编码器进行高质量的多领域合成。随后，我们对该模型进行了改进，并补充了一种基于语义和风格联合归一化的简单而有效的归一化方法，我们称之为空间风格自适应归一化（SSTAN）。与之前仅将语义布局作为输入的方法相比，我们的模型联合学习了风格和语义信息表示，从而提高了艺术图像的生成质量。这些结果表明，我们的模型学会了在潜在空间中分离域。因此，我们可以通过识别分隔不同领域的超平面，对合成的艺术作品进行精细控制。此外，通过结合所提出的数据集和方法，我们生成了用户可控的艺术作品，其质量高于现有方法，量化指标和用户研究也证实了这一点。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Computational Visual Media Computer Science-Computer Graphics and Computer-Aided Design

CiteScore

16.90

自引率

5.80%

发文量

243

审稿时长

6 weeks

期刊介绍： Computational Visual Media is a peer-reviewed open access journal. It publishes original high-quality research papers and significant review articles on novel ideas, methods, and systems relevant to visual media. Computational Visual Media publishes articles that focus on, but are not limited to, the following areas: • Editing and composition of visual media • Geometric computing for images and video • Geometry modeling and processing • Machine learning for visual media • Physically based animation • Realistic rendering • Recognition and understanding of visual media • Visual computing for robotics • Visualization and visual analytics Other interdisciplinary research into visual media that combines aspects of computer graphics, computer vision, image and video processing, geometric computing, and machine learning is also within the journal''s scope. This is an open access journal, published quarterly by Tsinghua University Press and Springer. The open access fees (article-processing charges) are fully sponsored by Tsinghua University, China. Authors can publish in the journal without any additional charges.