{"title":"可控图像生成和处理","authors":"I. Patras","doi":"10.1145/3592572.3596476","DOIUrl":null,"url":null,"abstract":"Recent years have witnessed an unprecedented interest in developing Deep Learning methodologies for the generation of images and image sequences that are hardly distinguishable to the human eye from real ones. A major issue in this field is how the generation can be easily controlled. In this talk we will focus on some of our recent works in generative models that are primarily aimed at controllable generation. We will first present unsupervised methods for learning non-linear paths in the latent spaces of Generative Adversarial Networks such that following different paths lead to different types of changes (e.g., removing the background, changing head poses, or facial expressions) in the resulting images [4]. Subsequently, we will present a method that allows local editing by finding a Parts and Appearances decomposition in the GAN latent space [2]. Then, we will present recent works on reenactment [1], where the goal is to transfer the facial activity (pose, expressions, speech) of a certain person to another one, and recent works in which supervision for generation comes from language models [3]. Finally, we will touch on the technical challenges ahead, as well on the challenges that this creates in spreading misinformation.","PeriodicalId":239252,"journal":{"name":"Proceedings of the 2nd ACM International Workshop on Multimedia AI against Disinformation","volume":"73 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-06-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Controllable image generation and manipulation\",\"authors\":\"I. Patras\",\"doi\":\"10.1145/3592572.3596476\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Recent years have witnessed an unprecedented interest in developing Deep Learning methodologies for the generation of images and image sequences that are hardly distinguishable to the human eye from real ones. A major issue in this field is how the generation can be easily controlled. In this talk we will focus on some of our recent works in generative models that are primarily aimed at controllable generation. We will first present unsupervised methods for learning non-linear paths in the latent spaces of Generative Adversarial Networks such that following different paths lead to different types of changes (e.g., removing the background, changing head poses, or facial expressions) in the resulting images [4]. Subsequently, we will present a method that allows local editing by finding a Parts and Appearances decomposition in the GAN latent space [2]. Then, we will present recent works on reenactment [1], where the goal is to transfer the facial activity (pose, expressions, speech) of a certain person to another one, and recent works in which supervision for generation comes from language models [3]. Finally, we will touch on the technical challenges ahead, as well on the challenges that this creates in spreading misinformation.\",\"PeriodicalId\":239252,\"journal\":{\"name\":\"Proceedings of the 2nd ACM International Workshop on Multimedia AI against Disinformation\",\"volume\":\"73 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2023-06-12\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the 2nd ACM International Workshop on Multimedia AI against Disinformation\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/3592572.3596476\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 2nd ACM International Workshop on Multimedia AI against Disinformation","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3592572.3596476","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Recent years have witnessed an unprecedented interest in developing Deep Learning methodologies for the generation of images and image sequences that are hardly distinguishable to the human eye from real ones. A major issue in this field is how the generation can be easily controlled. In this talk we will focus on some of our recent works in generative models that are primarily aimed at controllable generation. We will first present unsupervised methods for learning non-linear paths in the latent spaces of Generative Adversarial Networks such that following different paths lead to different types of changes (e.g., removing the background, changing head poses, or facial expressions) in the resulting images [4]. Subsequently, we will present a method that allows local editing by finding a Parts and Appearances decomposition in the GAN latent space [2]. Then, we will present recent works on reenactment [1], where the goal is to transfer the facial activity (pose, expressions, speech) of a certain person to another one, and recent works in which supervision for generation comes from language models [3]. Finally, we will touch on the technical challenges ahead, as well on the challenges that this creates in spreading misinformation.