文本到图像生成模型概览

Applied and Computational Engineering Pub Date : 2024-07-25 DOI:10.54254/2755-2721/79/20241286

Jingjing Xu, Jiahao Du, Junyi Wang

{"title":"文本到图像生成模型概览","authors":"Jingjing Xu, Jiahao Du, Junyi Wang","doi":"10.54254/2755-2721/79/20241286","DOIUrl":null,"url":null,"abstract":"The emergence and rapid development of neural networks have been pivotal in advancing text-to-image generative models, with particular emphasis on generative adversarial networks (GANs), variational autoencoders (VAEs), and augmented reality (AR). These models have greatly enriched the field, offering diverse avenues for image generation. Critical support has been provided by databases such as MS COCO, Flickr30K, Visual Genome, and Conceptual Captions, along with essential evaluation metrics, including Inception Score (IS), Frchet Inception Distance (FID), precision, and recall. In this comprehensive review, we delve into the mechanisms and significance of each model and technique, ensuring a holistic examination of their contributions. Both GANs and VAEs stand out as significant models within image generative frameworks, each excelling in distinct aspects. Therefore, it is imperative to discuss both models in this review, as they offer complementary strengths. Additionally, we include noteworthy models such as augmented reality to provide a well-rounded assessment of the current advancements in the field. In terms of datasets, MS COCO offers a diverse and extensive collection of images, serving as a cornerstone for model training. Other datasets like Flickr 30k, Visual Genome, and Conceptual Captions contribute valuable labeled examples, further enriching the learning process for these models. The incorporation of widely recognized metrics and methodologies in the field allows for effective evaluation and comparison of their relative significance. In conclusion, the field's recent achievements owe much to the integration of its various components. VAEs and GANs, with their unique strengths, complement each other, while metrics and datasets play complementary roles in advancing the capabilities of generative models in the context of text-to-image synthesis. This survey underscores the collaborative synergy between models, metrics, and datasets, propelling the field toward new horizons.","PeriodicalId":502253,"journal":{"name":"Applied and Computational Engineering","volume":"15 8","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-07-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"A survey of generative models used in text-to-image\",\"authors\":\"Jingjing Xu, Jiahao Du, Junyi Wang\",\"doi\":\"10.54254/2755-2721/79/20241286\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The emergence and rapid development of neural networks have been pivotal in advancing text-to-image generative models, with particular emphasis on generative adversarial networks (GANs), variational autoencoders (VAEs), and augmented reality (AR). These models have greatly enriched the field, offering diverse avenues for image generation. Critical support has been provided by databases such as MS COCO, Flickr30K, Visual Genome, and Conceptual Captions, along with essential evaluation metrics, including Inception Score (IS), Frchet Inception Distance (FID), precision, and recall. In this comprehensive review, we delve into the mechanisms and significance of each model and technique, ensuring a holistic examination of their contributions. Both GANs and VAEs stand out as significant models within image generative frameworks, each excelling in distinct aspects. Therefore, it is imperative to discuss both models in this review, as they offer complementary strengths. Additionally, we include noteworthy models such as augmented reality to provide a well-rounded assessment of the current advancements in the field. In terms of datasets, MS COCO offers a diverse and extensive collection of images, serving as a cornerstone for model training. Other datasets like Flickr 30k, Visual Genome, and Conceptual Captions contribute valuable labeled examples, further enriching the learning process for these models. The incorporation of widely recognized metrics and methodologies in the field allows for effective evaluation and comparison of their relative significance. In conclusion, the field's recent achievements owe much to the integration of its various components. VAEs and GANs, with their unique strengths, complement each other, while metrics and datasets play complementary roles in advancing the capabilities of generative models in the context of text-to-image synthesis. This survey underscores the collaborative synergy between models, metrics, and datasets, propelling the field toward new horizons.\",\"PeriodicalId\":502253,\"journal\":{\"name\":\"Applied and Computational Engineering\",\"volume\":\"15 8\",\"pages\":\"\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2024-07-25\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Applied and Computational Engineering\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.54254/2755-2721/79/20241286\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Applied and Computational Engineering","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.54254/2755-2721/79/20241286","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

摘要

神经网络的出现和快速发展在推动文本到图像生成模型方面发挥了关键作用，尤其是生成对抗网络（GAN）、变异自动编码器（VAE）和增强现实（AR）。这些模型极大地丰富了这一领域，为图像生成提供了多种途径。MS COCO、Flickr30K、Visual Genome 和 Conceptual Captions 等数据库提供了重要支持，同时还提供了重要的评估指标，包括入门分数（IS）、Frchet 入门距离（FID）、精确度和召回率。在这篇综合评论中，我们深入探讨了每种模型和技术的机制和意义，确保对它们的贡献进行全面考察。GANs 和 VAEs 都是图像生成框架中的重要模型，各自在不同的方面表现出色。因此，在本综述中必须讨论这两种模型，因为它们具有互补优势。此外，我们还纳入了增强现实等值得关注的模型，以便对该领域的当前进展进行全面评估。在数据集方面，MS COCO 提供了丰富多样的图像，是模型训练的基石。Flickr 30k、Visual Genome 和 Conceptual Captions 等其他数据集提供了宝贵的标注示例，进一步丰富了这些模型的学习过程。通过采用该领域广泛认可的指标和方法，可以有效评估和比较它们的相对重要性。总之，该领域最近取得的成就在很大程度上归功于其各个组成部分的整合。具有独特优势的 VAE 和 GAN 相辅相成，而度量标准和数据集则在文本到图像合成中发挥了互补作用，推动了生成模型能力的提高。这项调查强调了模型、度量和数据集之间的合作协同作用，推动这一领域迈向新的境界。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

A survey of generative models used in text-to-image

The emergence and rapid development of neural networks have been pivotal in advancing text-to-image generative models, with particular emphasis on generative adversarial networks (GANs), variational autoencoders (VAEs), and augmented reality (AR). These models have greatly enriched the field, offering diverse avenues for image generation. Critical support has been provided by databases such as MS COCO, Flickr30K, Visual Genome, and Conceptual Captions, along with essential evaluation metrics, including Inception Score (IS), Frchet Inception Distance (FID), precision, and recall. In this comprehensive review, we delve into the mechanisms and significance of each model and technique, ensuring a holistic examination of their contributions. Both GANs and VAEs stand out as significant models within image generative frameworks, each excelling in distinct aspects. Therefore, it is imperative to discuss both models in this review, as they offer complementary strengths. Additionally, we include noteworthy models such as augmented reality to provide a well-rounded assessment of the current advancements in the field. In terms of datasets, MS COCO offers a diverse and extensive collection of images, serving as a cornerstone for model training. Other datasets like Flickr 30k, Visual Genome, and Conceptual Captions contribute valuable labeled examples, further enriching the learning process for these models. The incorporation of widely recognized metrics and methodologies in the field allows for effective evaluation and comparison of their relative significance. In conclusion, the field's recent achievements owe much to the integration of its various components. VAEs and GANs, with their unique strengths, complement each other, while metrics and datasets play complementary roles in advancing the capabilities of generative models in the context of text-to-image synthesis. This survey underscores the collaborative synergy between models, metrics, and datasets, propelling the field toward new horizons.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Applied and Computational Engineering

自引率

0.00%

发文量