Timothy Merino, Roman Negri, Dipika Rajesh, M Charity, Julian Togelius
{"title":"5美元模型:基于句子嵌入生成游戏地图和精灵","authors":"Timothy Merino, Roman Negri, Dipika Rajesh, M Charity, Julian Togelius","doi":"10.1609/aiide.v19i1.27506","DOIUrl":null,"url":null,"abstract":"The five-dollar model is a lightweight text-to-image generative architecture that generates low dimensional images or tile maps from an encoded text prompt. This model can successfully generate accurate and aesthetically pleasing content in low dimensional domains, with limited amounts of training data. Despite the small size of both the model and datasets, the generated images or maps are still able to maintain the encoded semantic meaning of the textual prompt. We apply this model to three small datasets: pixel art video game maps, video game sprite images, and down-scaled emoji images and apply novel augmentation strategies to improve the performance of our model on these limited datasets. We evaluate our models' performance using cosine similarity score between text-image pairs generated by the CLIP VIT-B/32 model to demonstrate quality generation.","PeriodicalId":498041,"journal":{"name":"Proceedings of the AAAI Conference on Artificial Intelligence and Interactive Digital Entertainment","volume":"59 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-10-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"The Five-Dollar Model: Generating Game Maps and Sprites from Sentence Embeddings\",\"authors\":\"Timothy Merino, Roman Negri, Dipika Rajesh, M Charity, Julian Togelius\",\"doi\":\"10.1609/aiide.v19i1.27506\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The five-dollar model is a lightweight text-to-image generative architecture that generates low dimensional images or tile maps from an encoded text prompt. This model can successfully generate accurate and aesthetically pleasing content in low dimensional domains, with limited amounts of training data. Despite the small size of both the model and datasets, the generated images or maps are still able to maintain the encoded semantic meaning of the textual prompt. We apply this model to three small datasets: pixel art video game maps, video game sprite images, and down-scaled emoji images and apply novel augmentation strategies to improve the performance of our model on these limited datasets. We evaluate our models' performance using cosine similarity score between text-image pairs generated by the CLIP VIT-B/32 model to demonstrate quality generation.\",\"PeriodicalId\":498041,\"journal\":{\"name\":\"Proceedings of the AAAI Conference on Artificial Intelligence and Interactive Digital Entertainment\",\"volume\":\"59 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2023-10-06\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the AAAI Conference on Artificial Intelligence and Interactive Digital Entertainment\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1609/aiide.v19i1.27506\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the AAAI Conference on Artificial Intelligence and Interactive Digital Entertainment","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1609/aiide.v19i1.27506","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
The Five-Dollar Model: Generating Game Maps and Sprites from Sentence Embeddings
The five-dollar model is a lightweight text-to-image generative architecture that generates low dimensional images or tile maps from an encoded text prompt. This model can successfully generate accurate and aesthetically pleasing content in low dimensional domains, with limited amounts of training data. Despite the small size of both the model and datasets, the generated images or maps are still able to maintain the encoded semantic meaning of the textual prompt. We apply this model to three small datasets: pixel art video game maps, video game sprite images, and down-scaled emoji images and apply novel augmentation strategies to improve the performance of our model on these limited datasets. We evaluate our models' performance using cosine similarity score between text-image pairs generated by the CLIP VIT-B/32 model to demonstrate quality generation.