{"title":"PIPformers:利用视觉变换器进行基于补丁的内绘,实现通用绘画","authors":"Jeyoung Lee, Hochul Kang","doi":"10.1002/cav.2270","DOIUrl":null,"url":null,"abstract":"<p>Image inpainting is a field that has been traditionally attempted in the field of computer vision. After the development of deep learning, image inpainting has been advancing endlessly together with convolutional neural networks and generative adversarial networks. Thereafter, it has been expanded to various fields such as image filing through guiding and image inpainting using various masking. Furthermore, the field termed image out-painting has also been pioneered. Meanwhile, after the recent announcement of the vision transformer, various computer vision problems have been attempted using the vision transformer. In this paper, we are trying to solve the problem of image generalization painting using the vision transformer. This is an attempt to fill images with painting no matter whether the areas where painting is missing are in or out of the images, and without guiding. To that end, the painting problem was defined as a problem to drop images in patch units for easy use in the vision transformer. And we solved the problem with a simple network structure created by slightly modifying the vision transformer to fit the problem. We named this network PIPformers. PIPformers achieved better values than other papers compared to PSNR, RMSE and SSIM.</p>","PeriodicalId":50645,"journal":{"name":"Computer Animation and Virtual Worlds","volume":"35 3","pages":""},"PeriodicalIF":0.9000,"publicationDate":"2024-05-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1002/cav.2270","citationCount":"0","resultStr":"{\"title\":\"PIPformers: Patch based inpainting with vision transformers for generalize paintings\",\"authors\":\"Jeyoung Lee, Hochul Kang\",\"doi\":\"10.1002/cav.2270\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p>Image inpainting is a field that has been traditionally attempted in the field of computer vision. After the development of deep learning, image inpainting has been advancing endlessly together with convolutional neural networks and generative adversarial networks. Thereafter, it has been expanded to various fields such as image filing through guiding and image inpainting using various masking. Furthermore, the field termed image out-painting has also been pioneered. Meanwhile, after the recent announcement of the vision transformer, various computer vision problems have been attempted using the vision transformer. In this paper, we are trying to solve the problem of image generalization painting using the vision transformer. This is an attempt to fill images with painting no matter whether the areas where painting is missing are in or out of the images, and without guiding. To that end, the painting problem was defined as a problem to drop images in patch units for easy use in the vision transformer. And we solved the problem with a simple network structure created by slightly modifying the vision transformer to fit the problem. We named this network PIPformers. PIPformers achieved better values than other papers compared to PSNR, RMSE and SSIM.</p>\",\"PeriodicalId\":50645,\"journal\":{\"name\":\"Computer Animation and Virtual Worlds\",\"volume\":\"35 3\",\"pages\":\"\"},\"PeriodicalIF\":0.9000,\"publicationDate\":\"2024-05-17\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://onlinelibrary.wiley.com/doi/epdf/10.1002/cav.2270\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Computer Animation and Virtual Worlds\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://onlinelibrary.wiley.com/doi/10.1002/cav.2270\",\"RegionNum\":4,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q4\",\"JCRName\":\"COMPUTER SCIENCE, SOFTWARE ENGINEERING\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Computer Animation and Virtual Worlds","FirstCategoryId":"94","ListUrlMain":"https://onlinelibrary.wiley.com/doi/10.1002/cav.2270","RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q4","JCRName":"COMPUTER SCIENCE, SOFTWARE ENGINEERING","Score":null,"Total":0}
PIPformers: Patch based inpainting with vision transformers for generalize paintings
Image inpainting is a field that has been traditionally attempted in the field of computer vision. After the development of deep learning, image inpainting has been advancing endlessly together with convolutional neural networks and generative adversarial networks. Thereafter, it has been expanded to various fields such as image filing through guiding and image inpainting using various masking. Furthermore, the field termed image out-painting has also been pioneered. Meanwhile, after the recent announcement of the vision transformer, various computer vision problems have been attempted using the vision transformer. In this paper, we are trying to solve the problem of image generalization painting using the vision transformer. This is an attempt to fill images with painting no matter whether the areas where painting is missing are in or out of the images, and without guiding. To that end, the painting problem was defined as a problem to drop images in patch units for easy use in the vision transformer. And we solved the problem with a simple network structure created by slightly modifying the vision transformer to fit the problem. We named this network PIPformers. PIPformers achieved better values than other papers compared to PSNR, RMSE and SSIM.
期刊介绍:
With the advent of very powerful PCs and high-end graphics cards, there has been an incredible development in Virtual Worlds, real-time computer animation and simulation, games. But at the same time, new and cheaper Virtual Reality devices have appeared allowing an interaction with these real-time Virtual Worlds and even with real worlds through Augmented Reality. Three-dimensional characters, especially Virtual Humans are now of an exceptional quality, which allows to use them in the movie industry. But this is only a beginning, as with the development of Artificial Intelligence and Agent technology, these characters will become more and more autonomous and even intelligent. They will inhabit the Virtual Worlds in a Virtual Life together with animals and plants.