5美元模型:基于句子嵌入生成游戏地图和精灵

Timothy Merino, Roman Negri, Dipika Rajesh, M Charity, Julian Togelius
{"title":"5美元模型:基于句子嵌入生成游戏地图和精灵","authors":"Timothy Merino, Roman Negri, Dipika Rajesh, M Charity, Julian Togelius","doi":"10.1609/aiide.v19i1.27506","DOIUrl":null,"url":null,"abstract":"The five-dollar model is a lightweight text-to-image generative architecture that generates low dimensional images or tile maps from an encoded text prompt. This model can successfully generate accurate and aesthetically pleasing content in low dimensional domains, with limited amounts of training data. Despite the small size of both the model and datasets, the generated images or maps are still able to maintain the encoded semantic meaning of the textual prompt. We apply this model to three small datasets: pixel art video game maps, video game sprite images, and down-scaled emoji images and apply novel augmentation strategies to improve the performance of our model on these limited datasets. We evaluate our models' performance using cosine similarity score between text-image pairs generated by the CLIP VIT-B/32 model to demonstrate quality generation.","PeriodicalId":498041,"journal":{"name":"Proceedings of the AAAI Conference on Artificial Intelligence and Interactive Digital Entertainment","volume":"59 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-10-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"The Five-Dollar Model: Generating Game Maps and Sprites from Sentence Embeddings\",\"authors\":\"Timothy Merino, Roman Negri, Dipika Rajesh, M Charity, Julian Togelius\",\"doi\":\"10.1609/aiide.v19i1.27506\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The five-dollar model is a lightweight text-to-image generative architecture that generates low dimensional images or tile maps from an encoded text prompt. This model can successfully generate accurate and aesthetically pleasing content in low dimensional domains, with limited amounts of training data. Despite the small size of both the model and datasets, the generated images or maps are still able to maintain the encoded semantic meaning of the textual prompt. We apply this model to three small datasets: pixel art video game maps, video game sprite images, and down-scaled emoji images and apply novel augmentation strategies to improve the performance of our model on these limited datasets. We evaluate our models' performance using cosine similarity score between text-image pairs generated by the CLIP VIT-B/32 model to demonstrate quality generation.\",\"PeriodicalId\":498041,\"journal\":{\"name\":\"Proceedings of the AAAI Conference on Artificial Intelligence and Interactive Digital Entertainment\",\"volume\":\"59 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2023-10-06\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the AAAI Conference on Artificial Intelligence and Interactive Digital Entertainment\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1609/aiide.v19i1.27506\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the AAAI Conference on Artificial Intelligence and Interactive Digital Entertainment","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1609/aiide.v19i1.27506","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

摘要

5美元的模型是一种轻量级的文本到图像生成架构,可以从编码的文本提示生成低维图像或贴图。该模型可以在训练数据量有限的情况下,成功地在低维域中生成准确且美观的内容。尽管模型和数据集都很小,但生成的图像或地图仍然能够保持文本提示的编码语义。我们将该模型应用于三个小数据集:像素艺术电子游戏地图,电子游戏精灵图像和缩小的表情符号图像,并应用新颖的增强策略来提高我们的模型在这些有限数据集上的性能。我们使用CLIP VIT-B/32模型生成的文本-图像对之间的余弦相似度评分来评估模型的性能,以演示质量生成。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
The Five-Dollar Model: Generating Game Maps and Sprites from Sentence Embeddings
The five-dollar model is a lightweight text-to-image generative architecture that generates low dimensional images or tile maps from an encoded text prompt. This model can successfully generate accurate and aesthetically pleasing content in low dimensional domains, with limited amounts of training data. Despite the small size of both the model and datasets, the generated images or maps are still able to maintain the encoded semantic meaning of the textual prompt. We apply this model to three small datasets: pixel art video game maps, video game sprite images, and down-scaled emoji images and apply novel augmentation strategies to improve the performance of our model on these limited datasets. We evaluate our models' performance using cosine similarity score between text-image pairs generated by the CLIP VIT-B/32 model to demonstrate quality generation.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信