{"title":"基于先验知识的文本到图像合成半参数方法","authors":"Jiadong Liang","doi":"10.1145/3579654.3579717","DOIUrl":null,"url":null,"abstract":"Text-to-image synthesis adopts only text descriptions as input to generate consistent images which should have high visual quality and be semantically aligned with the input text. Compared to images, the textual semantics is ambiguous and sparse, which makes it challenging to map features directly and accurately from text space to image space. To address this issue, the intuitive method is to construct an intermediate space connecting text and image. Using layout as a bridge between text and image not only mitigates the difficulty of the task, but also constrains the spatial distribution of objects in the generated images, which is crucial to the quality of synthesized images. In this paper, we build a two-stage framework for text-to-image synthesis, i.e., Layout Searching by Text Matching, and Layout-to-Image Synthesis with Fine-Grained Textual Semantic Injection. Specifically, we build the prior layout knowledge from the training dataset and propose a semi-parametric layout searching strategy to retrieve the layout that matches the input sentence by measuring the semantic distance between different textual descriptions. In the stage of layout-to-image synthesis, we construct the Textual and Spatial Alignment Generative Adversarial Networks (TSAGANs) that are designed to guarantee the fine-grained alignment of the generated images with the input text and layout obtained in the first stage. Extensive experiments conducted on the COCO-stuff dataset manifest that our method can obtain more reasonable layouts and improve the performance of synthesized images significantly.","PeriodicalId":146783,"journal":{"name":"Proceedings of the 2022 5th International Conference on Algorithms, Computing and Artificial Intelligence","volume":"37 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-12-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"A Semi-Parametric Method for Text-to-Image Synthesis from Prior Knowledge\",\"authors\":\"Jiadong Liang\",\"doi\":\"10.1145/3579654.3579717\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Text-to-image synthesis adopts only text descriptions as input to generate consistent images which should have high visual quality and be semantically aligned with the input text. Compared to images, the textual semantics is ambiguous and sparse, which makes it challenging to map features directly and accurately from text space to image space. To address this issue, the intuitive method is to construct an intermediate space connecting text and image. Using layout as a bridge between text and image not only mitigates the difficulty of the task, but also constrains the spatial distribution of objects in the generated images, which is crucial to the quality of synthesized images. In this paper, we build a two-stage framework for text-to-image synthesis, i.e., Layout Searching by Text Matching, and Layout-to-Image Synthesis with Fine-Grained Textual Semantic Injection. Specifically, we build the prior layout knowledge from the training dataset and propose a semi-parametric layout searching strategy to retrieve the layout that matches the input sentence by measuring the semantic distance between different textual descriptions. In the stage of layout-to-image synthesis, we construct the Textual and Spatial Alignment Generative Adversarial Networks (TSAGANs) that are designed to guarantee the fine-grained alignment of the generated images with the input text and layout obtained in the first stage. Extensive experiments conducted on the COCO-stuff dataset manifest that our method can obtain more reasonable layouts and improve the performance of synthesized images significantly.\",\"PeriodicalId\":146783,\"journal\":{\"name\":\"Proceedings of the 2022 5th International Conference on Algorithms, Computing and Artificial Intelligence\",\"volume\":\"37 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2022-12-23\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the 2022 5th International Conference on Algorithms, Computing and Artificial Intelligence\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/3579654.3579717\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 2022 5th International Conference on Algorithms, Computing and Artificial Intelligence","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3579654.3579717","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
A Semi-Parametric Method for Text-to-Image Synthesis from Prior Knowledge
Text-to-image synthesis adopts only text descriptions as input to generate consistent images which should have high visual quality and be semantically aligned with the input text. Compared to images, the textual semantics is ambiguous and sparse, which makes it challenging to map features directly and accurately from text space to image space. To address this issue, the intuitive method is to construct an intermediate space connecting text and image. Using layout as a bridge between text and image not only mitigates the difficulty of the task, but also constrains the spatial distribution of objects in the generated images, which is crucial to the quality of synthesized images. In this paper, we build a two-stage framework for text-to-image synthesis, i.e., Layout Searching by Text Matching, and Layout-to-Image Synthesis with Fine-Grained Textual Semantic Injection. Specifically, we build the prior layout knowledge from the training dataset and propose a semi-parametric layout searching strategy to retrieve the layout that matches the input sentence by measuring the semantic distance between different textual descriptions. In the stage of layout-to-image synthesis, we construct the Textual and Spatial Alignment Generative Adversarial Networks (TSAGANs) that are designed to guarantee the fine-grained alignment of the generated images with the input text and layout obtained in the first stage. Extensive experiments conducted on the COCO-stuff dataset manifest that our method can obtain more reasonable layouts and improve the performance of synthesized images significantly.