R. Perumalraja., A. S. Arjunkumar, N. N. Mohamed, E. Siva, S. Kamalesh
{"title":"文本到图像的翻译使用GAN与NLP和计算机视觉","authors":"R. Perumalraja., A. S. Arjunkumar, N. N. Mohamed, E. Siva, S. Kamalesh","doi":"10.37896/pd91.4/91449","DOIUrl":null,"url":null,"abstract":"Generating high-quality images from text queries is a challenging problem in computer vision and has many practical applications. This paper proposes Stacked Generative Adversarial Networks (StackGAN) to generate 256 x 256 photo-realistic images conditioned on text descriptions. We resolve the hard problem into more manageable sub-problems through a sketch-refinement process. The Stage-I GAN gives the primitive shape and colors of the object based on the given text description, yielding Stage-I low-resolution images. The Stage-II GAN uses Stage-I results and text descriptions as inputs and generates high-resolution images with photorealistic details. It can correct defects in Stage-I results and add compelling details to the refinement process. To improve the generated images' variety and regulate the conditional-GAN training, we introduce a novel Conditioning Augmentation technique. Various experiments and comparisons with state-of-the-art benchmark datasets demonstrate that the proposed method achieves significant improvements in generating photo-realistic images conditioned on text queries.","PeriodicalId":20006,"journal":{"name":"Periodico Di Mineralogia","volume":"30 1","pages":""},"PeriodicalIF":1.2000,"publicationDate":"2022-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Text to Image Translation using GAN with NLP and Computer Vision\",\"authors\":\"R. Perumalraja., A. S. Arjunkumar, N. N. Mohamed, E. Siva, S. Kamalesh\",\"doi\":\"10.37896/pd91.4/91449\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Generating high-quality images from text queries is a challenging problem in computer vision and has many practical applications. This paper proposes Stacked Generative Adversarial Networks (StackGAN) to generate 256 x 256 photo-realistic images conditioned on text descriptions. We resolve the hard problem into more manageable sub-problems through a sketch-refinement process. The Stage-I GAN gives the primitive shape and colors of the object based on the given text description, yielding Stage-I low-resolution images. The Stage-II GAN uses Stage-I results and text descriptions as inputs and generates high-resolution images with photorealistic details. It can correct defects in Stage-I results and add compelling details to the refinement process. To improve the generated images' variety and regulate the conditional-GAN training, we introduce a novel Conditioning Augmentation technique. Various experiments and comparisons with state-of-the-art benchmark datasets demonstrate that the proposed method achieves significant improvements in generating photo-realistic images conditioned on text queries.\",\"PeriodicalId\":20006,\"journal\":{\"name\":\"Periodico Di Mineralogia\",\"volume\":\"30 1\",\"pages\":\"\"},\"PeriodicalIF\":1.2000,\"publicationDate\":\"2022-04-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Periodico Di Mineralogia\",\"FirstCategoryId\":\"89\",\"ListUrlMain\":\"https://doi.org/10.37896/pd91.4/91449\",\"RegionNum\":4,\"RegionCategory\":\"地球科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q3\",\"JCRName\":\"GEOCHEMISTRY & GEOPHYSICS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Periodico Di Mineralogia","FirstCategoryId":"89","ListUrlMain":"https://doi.org/10.37896/pd91.4/91449","RegionNum":4,"RegionCategory":"地球科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"GEOCHEMISTRY & GEOPHYSICS","Score":null,"Total":0}
Text to Image Translation using GAN with NLP and Computer Vision
Generating high-quality images from text queries is a challenging problem in computer vision and has many practical applications. This paper proposes Stacked Generative Adversarial Networks (StackGAN) to generate 256 x 256 photo-realistic images conditioned on text descriptions. We resolve the hard problem into more manageable sub-problems through a sketch-refinement process. The Stage-I GAN gives the primitive shape and colors of the object based on the given text description, yielding Stage-I low-resolution images. The Stage-II GAN uses Stage-I results and text descriptions as inputs and generates high-resolution images with photorealistic details. It can correct defects in Stage-I results and add compelling details to the refinement process. To improve the generated images' variety and regulate the conditional-GAN training, we introduce a novel Conditioning Augmentation technique. Various experiments and comparisons with state-of-the-art benchmark datasets demonstrate that the proposed method achieves significant improvements in generating photo-realistic images conditioned on text queries.
期刊介绍:
Periodico di Mineralogia is an international peer-reviewed Open Access journal publishing Research Articles, Letters and Reviews in Mineralogy, Crystallography, Geochemistry, Ore Deposits, Petrology, Volcanology and applied topics on Environment, Archaeometry and Cultural Heritage. The journal aims at encouraging scientists to publish their experimental and theoretical results in as much detail as possible. Accordingly, there is no restriction on article length. Additional data may be hosted on the web sites as Supplementary Information. The journal does not have article submission and processing charges. Colour is free of charges both on line and printed and no Open Access fees are requested. Short publication time is assured.
Periodico di Mineralogia is property of Sapienza Università di Roma and is published, both online and printed, three times a year.