A survey of generative adversarial networks and their application in text-to-image synthesis

IF 1.1 4区数学 Q1 MATHEMATICS

Electronic Research Archive Pub Date : 2023-01-01 DOI:10.3934/era.2023362

Wu Zeng, Heng-liang Zhu, Chuan Lin, Zheng-ying Xiao

{"title":"A survey of generative adversarial networks and their application in text-to-image synthesis","authors":"Wu Zeng, Heng-liang Zhu, Chuan Lin, Zheng-ying Xiao","doi":"10.3934/era.2023362","DOIUrl":null,"url":null,"abstract":"<abstract><p>With the continuous development of science and technology (especially computational devices with powerful computing capabilities), the image generation technology based on deep learning has also made significant achievements. Most cross-modal technologies based on deep learning can generate information from text into images, which has become a hot topic of current research. Text-to-image (T2I) synthesis technology has applications in multiple fields of computer vision, such as image enhancement, artificial intelligence painting, games and virtual reality. The T2I generation technology using generative adversarial networks can generate more realistic and diverse images, but there are also some shortcomings and challenges, such as difficulty in generating complex backgrounds. This review will be introduced in the following order. First, we introduce the basic principles and architecture of basic and classic generative adversarial networks (GANs). Second, this review categorizes T2I synthesis methods into four main categories. There are methods based on semantic enhancement, methods based on progressive structure, methods based on attention and methods based on introducing additional signals. We have chosen some of the classic and latest T2I methods for introduction and explain their main advantages and shortcomings. Third, we explain the basic dataset and evaluation indicators in the T2I field. Finally, prospects for future research directions are discussed. This review provides a systematic introduction to the basic GAN method and the T2I method based on it, which can serve as a reference for researchers.</p></abstract>","PeriodicalId":48554,"journal":{"name":"Electronic Research Archive","volume":"12 1","pages":"0"},"PeriodicalIF":1.1000,"publicationDate":"2023-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Electronic Research Archive","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.3934/era.2023362","RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"MATHEMATICS","Score":null,"Total":0}

引用次数: 0

Abstract

With the continuous development of science and technology (especially computational devices with powerful computing capabilities), the image generation technology based on deep learning has also made significant achievements. Most cross-modal technologies based on deep learning can generate information from text into images, which has become a hot topic of current research. Text-to-image (T2I) synthesis technology has applications in multiple fields of computer vision, such as image enhancement, artificial intelligence painting, games and virtual reality. The T2I generation technology using generative adversarial networks can generate more realistic and diverse images, but there are also some shortcomings and challenges, such as difficulty in generating complex backgrounds. This review will be introduced in the following order. First, we introduce the basic principles and architecture of basic and classic generative adversarial networks (GANs). Second, this review categorizes T2I synthesis methods into four main categories. There are methods based on semantic enhancement, methods based on progressive structure, methods based on attention and methods based on introducing additional signals. We have chosen some of the classic and latest T2I methods for introduction and explain their main advantages and shortcomings. Third, we explain the basic dataset and evaluation indicators in the T2I field. Finally, prospects for future research directions are discussed. This review provides a systematic introduction to the basic GAN method and the T2I method based on it, which can serve as a reference for researchers.

查看原文本刊更多论文

生成对抗网络及其在文本到图像合成中的应用综述

随着科学技术(特别是具有强大计算能力的计算设备)的不断发展，基于深度学习的图像生成技术也取得了显著的成就。大多数基于深度学习的跨模态技术都可以将文本信息生成为图像，这已经成为当前研究的热点。文本到图像(tt2i)合成技术在计算机视觉的多个领域有应用，如图像增强、人工智能绘画、游戏和虚拟现实。使用生成对抗网络的T2I生成技术可以生成更加逼真和多样化的图像，但也存在一些缺点和挑战，例如难以生成复杂的背景。这篇综述将按以下顺序介绍。首先，我们介绍了基本和经典生成对抗网络(gan)的基本原理和结构。其次，本文将T2I综合方法分为四大类。有基于语义增强的方法、基于递进结构的方法、基于注意的方法和基于引入附加信号的方法。我们选择了一些经典的和最新的T2I方法进行介绍，并解释了它们的主要优点和缺点。第三，解释了T2I领域的基本数据集和评价指标。最后，对今后的研究方向进行了展望。本文系统介绍了GAN的基本方法及其基础上的T2I方法，可供研究人员参考。</p></abstract>

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Electronic Research Archive MATHEMATICS-

CiteScore

1.30

自引率

12.50%

发文量

170