5美元模型:基于句子嵌入生成游戏地图和精灵

Proceedings of the AAAI Conference on Artificial Intelligence and Interactive Digital Entertainment Pub Date : 2023-10-06 DOI:10.1609/aiide.v19i1.27506

Timothy Merino, Roman Negri, Dipika Rajesh, M Charity, Julian Togelius

{"title":"5美元模型:基于句子嵌入生成游戏地图和精灵","authors":"Timothy Merino, Roman Negri, Dipika Rajesh, M Charity, Julian Togelius","doi":"10.1609/aiide.v19i1.27506","DOIUrl":null,"url":null,"abstract":"The five-dollar model is a lightweight text-to-image generative architecture that generates low dimensional images or tile maps from an encoded text prompt. This model can successfully generate accurate and aesthetically pleasing content in low dimensional domains, with limited amounts of training data. Despite the small size of both the model and datasets, the generated images or maps are still able to maintain the encoded semantic meaning of the textual prompt. We apply this model to three small datasets: pixel art video game maps, video game sprite images, and down-scaled emoji images and apply novel augmentation strategies to improve the performance of our model on these limited datasets. We evaluate our models' performance using cosine similarity score between text-image pairs generated by the CLIP VIT-B/32 model to demonstrate quality generation.","PeriodicalId":498041,"journal":{"name":"Proceedings of the AAAI Conference on Artificial Intelligence and Interactive Digital Entertainment","volume":"59 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-10-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"The Five-Dollar Model: Generating Game Maps and Sprites from Sentence Embeddings\",\"authors\":\"Timothy Merino, Roman Negri, Dipika Rajesh, M Charity, Julian Togelius\",\"doi\":\"10.1609/aiide.v19i1.27506\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The five-dollar model is a lightweight text-to-image generative architecture that generates low dimensional images or tile maps from an encoded text prompt. This model can successfully generate accurate and aesthetically pleasing content in low dimensional domains, with limited amounts of training data. Despite the small size of both the model and datasets, the generated images or maps are still able to maintain the encoded semantic meaning of the textual prompt. We apply this model to three small datasets: pixel art video game maps, video game sprite images, and down-scaled emoji images and apply novel augmentation strategies to improve the performance of our model on these limited datasets. We evaluate our models' performance using cosine similarity score between text-image pairs generated by the CLIP VIT-B/32 model to demonstrate quality generation.\",\"PeriodicalId\":498041,\"journal\":{\"name\":\"Proceedings of the AAAI Conference on Artificial Intelligence and Interactive Digital Entertainment\",\"volume\":\"59 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2023-10-06\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the AAAI Conference on Artificial Intelligence and Interactive Digital Entertainment\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1609/aiide.v19i1.27506\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the AAAI Conference on Artificial Intelligence and Interactive Digital Entertainment","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1609/aiide.v19i1.27506","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

摘要

5美元的模型是一种轻量级的文本到图像生成架构，可以从编码的文本提示生成低维图像或贴图。该模型可以在训练数据量有限的情况下，成功地在低维域中生成准确且美观的内容。尽管模型和数据集都很小，但生成的图像或地图仍然能够保持文本提示的编码语义。我们将该模型应用于三个小数据集:像素艺术电子游戏地图，电子游戏精灵图像和缩小的表情符号图像，并应用新颖的增强策略来提高我们的模型在这些有限数据集上的性能。我们使用CLIP VIT-B/32模型生成的文本-图像对之间的余弦相似度评分来评估模型的性能，以演示质量生成。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

The Five-Dollar Model: Generating Game Maps and Sprites from Sentence Embeddings

The five-dollar model is a lightweight text-to-image generative architecture that generates low dimensional images or tile maps from an encoded text prompt. This model can successfully generate accurate and aesthetically pleasing content in low dimensional domains, with limited amounts of training data. Despite the small size of both the model and datasets, the generated images or maps are still able to maintain the encoded semantic meaning of the textual prompt. We apply this model to three small datasets: pixel art video game maps, video game sprite images, and down-scaled emoji images and apply novel augmentation strategies to improve the performance of our model on these limited datasets. We evaluate our models' performance using cosine similarity score between text-image pairs generated by the CLIP VIT-B/32 model to demonstrate quality generation.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Proceedings of the AAAI Conference on Artificial Intelligence and Interactive Digital Entertainment

自引率

0.00%

发文量