Controllable image generation and manipulation

Proceedings of the 2nd ACM International Workshop on Multimedia AI against Disinformation Pub Date : 2023-06-12 DOI:10.1145/3592572.3596476

I. Patras

{"title":"Controllable image generation and manipulation","authors":"I. Patras","doi":"10.1145/3592572.3596476","DOIUrl":null,"url":null,"abstract":"Recent years have witnessed an unprecedented interest in developing Deep Learning methodologies for the generation of images and image sequences that are hardly distinguishable to the human eye from real ones. A major issue in this field is how the generation can be easily controlled. In this talk we will focus on some of our recent works in generative models that are primarily aimed at controllable generation. We will first present unsupervised methods for learning non-linear paths in the latent spaces of Generative Adversarial Networks such that following different paths lead to different types of changes (e.g., removing the background, changing head poses, or facial expressions) in the resulting images [4]. Subsequently, we will present a method that allows local editing by finding a Parts and Appearances decomposition in the GAN latent space [2]. Then, we will present recent works on reenactment [1], where the goal is to transfer the facial activity (pose, expressions, speech) of a certain person to another one, and recent works in which supervision for generation comes from language models [3]. Finally, we will touch on the technical challenges ahead, as well on the challenges that this creates in spreading misinformation.","PeriodicalId":239252,"journal":{"name":"Proceedings of the 2nd ACM International Workshop on Multimedia AI against Disinformation","volume":"73 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-06-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 2nd ACM International Workshop on Multimedia AI against Disinformation","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3592572.3596476","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

Recent years have witnessed an unprecedented interest in developing Deep Learning methodologies for the generation of images and image sequences that are hardly distinguishable to the human eye from real ones. A major issue in this field is how the generation can be easily controlled. In this talk we will focus on some of our recent works in generative models that are primarily aimed at controllable generation. We will first present unsupervised methods for learning non-linear paths in the latent spaces of Generative Adversarial Networks such that following different paths lead to different types of changes (e.g., removing the background, changing head poses, or facial expressions) in the resulting images [4]. Subsequently, we will present a method that allows local editing by finding a Parts and Appearances decomposition in the GAN latent space [2]. Then, we will present recent works on reenactment [1], where the goal is to transfer the facial activity (pose, expressions, speech) of a certain person to another one, and recent works in which supervision for generation comes from language models [3]. Finally, we will touch on the technical challenges ahead, as well on the challenges that this creates in spreading misinformation.

查看原文本刊更多论文

可控图像生成和处理

近年来，人们对开发深度学习方法产生的图像和图像序列产生了前所未有的兴趣，这些图像和图像序列很难被人眼与真实图像区分开来。该领域的一个主要问题是如何容易地控制生成。在这次演讲中，我们将重点介绍我们最近在生成模型方面的一些工作，这些模型主要针对可控生成。我们将首先介绍用于学习生成对抗网络潜在空间中的非线性路径的无监督方法，这样，在生成的图像中，遵循不同的路径会导致不同类型的变化(例如，去除背景，改变头部姿势或面部表情)[4]。随后，我们将提出一种方法，通过在GAN潜在空间中找到零件和外观分解来进行局部编辑[2]。然后，我们将介绍最近关于再现的作品[1]，其目标是将某个人的面部活动(姿势、表情、言语)转移到另一个人身上，以及最近来自语言模型的生成监督的作品[3]。最后，我们将谈到未来的技术挑战，以及这在传播错误信息方面所带来的挑战。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Proceedings of the 2nd ACM International Workshop on Multimedia AI against Disinformation

自引率

0.00%

发文量