PIPformers：利用视觉变换器进行基于补丁的内绘，实现通用绘画

IF 1.7 4区计算机科学 Q4 COMPUTER SCIENCE, SOFTWARE ENGINEERING

Computer Animation and Virtual Worlds Pub Date : 2024-05-17 DOI:10.1002/cav.2270

Jeyoung Lee, Hochul Kang

{"title":"PIPformers：利用视觉变换器进行基于补丁的内绘，实现通用绘画","authors":"Jeyoung Lee, Hochul Kang","doi":"10.1002/cav.2270","DOIUrl":null,"url":null,"abstract":"<p>Image inpainting is a field that has been traditionally attempted in the field of computer vision. After the development of deep learning, image inpainting has been advancing endlessly together with convolutional neural networks and generative adversarial networks. Thereafter, it has been expanded to various fields such as image filing through guiding and image inpainting using various masking. Furthermore, the field termed image out-painting has also been pioneered. Meanwhile, after the recent announcement of the vision transformer, various computer vision problems have been attempted using the vision transformer. In this paper, we are trying to solve the problem of image generalization painting using the vision transformer. This is an attempt to fill images with painting no matter whether the areas where painting is missing are in or out of the images, and without guiding. To that end, the painting problem was defined as a problem to drop images in patch units for easy use in the vision transformer. And we solved the problem with a simple network structure created by slightly modifying the vision transformer to fit the problem. We named this network PIPformers. PIPformers achieved better values than other papers compared to PSNR, RMSE and SSIM.</p>","PeriodicalId":50645,"journal":{"name":"Computer Animation and Virtual Worlds","volume":"35 3","pages":""},"PeriodicalIF":1.7000,"publicationDate":"2024-05-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1002/cav.2270","citationCount":"0","resultStr":"{\"title\":\"PIPformers: Patch based inpainting with vision transformers for generalize paintings\",\"authors\":\"Jeyoung Lee, Hochul Kang\",\"doi\":\"10.1002/cav.2270\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p>Image inpainting is a field that has been traditionally attempted in the field of computer vision. After the development of deep learning, image inpainting has been advancing endlessly together with convolutional neural networks and generative adversarial networks. Thereafter, it has been expanded to various fields such as image filing through guiding and image inpainting using various masking. Furthermore, the field termed image out-painting has also been pioneered. Meanwhile, after the recent announcement of the vision transformer, various computer vision problems have been attempted using the vision transformer. In this paper, we are trying to solve the problem of image generalization painting using the vision transformer. This is an attempt to fill images with painting no matter whether the areas where painting is missing are in or out of the images, and without guiding. To that end, the painting problem was defined as a problem to drop images in patch units for easy use in the vision transformer. And we solved the problem with a simple network structure created by slightly modifying the vision transformer to fit the problem. We named this network PIPformers. PIPformers achieved better values than other papers compared to PSNR, RMSE and SSIM.</p>\",\"PeriodicalId\":50645,\"journal\":{\"name\":\"Computer Animation and Virtual Worlds\",\"volume\":\"35 3\",\"pages\":\"\"},\"PeriodicalIF\":1.7000,\"publicationDate\":\"2024-05-17\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://onlinelibrary.wiley.com/doi/epdf/10.1002/cav.2270\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Computer Animation and Virtual Worlds\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://onlinelibrary.wiley.com/doi/10.1002/cav.2270\",\"RegionNum\":4,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q4\",\"JCRName\":\"COMPUTER SCIENCE, SOFTWARE ENGINEERING\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Computer Animation and Virtual Worlds","FirstCategoryId":"94","ListUrlMain":"https://onlinelibrary.wiley.com/doi/10.1002/cav.2270","RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q4","JCRName":"COMPUTER SCIENCE, SOFTWARE ENGINEERING","Score":null,"Total":0}

引用次数: 0

摘要

图像内绘是计算机视觉领域的传统尝试。深度学习发展起来后，图像内绘与卷积神经网络和生成对抗网络一起不断向前发展。此后，它又扩展到通过引导进行图像归档和使用各种遮罩进行图像内绘等多个领域。此外，还开创了称为 "图像外绘 "的领域。同时，在最近发布视觉变换器之后，人们开始尝试利用视觉变换器解决各种计算机视觉问题。在本文中，我们试图利用视觉变换器解决图像泛化绘制问题。这是一种尝试，无论缺少绘画的区域是在图像内还是图像外，都可以在不进行引导的情况下用绘画来填充图像。为此，绘画问题被定义为将图像丢弃在补丁单元中以便于视觉转换器使用的问题。为了解决这个问题，我们对视觉转换器稍作修改，创建了一个简单的网络结构。我们将这一网络命名为 PIPformers。与其他论文相比，PIPformers 在 PSNR、RMSE 和 SSIM 方面取得了更好的成绩。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

PIPformers: Patch based inpainting with vision transformers for generalize paintings

查看原文本刊更多论文

PIPformers: Patch based inpainting with vision transformers for generalize paintings

Image inpainting is a field that has been traditionally attempted in the field of computer vision. After the development of deep learning, image inpainting has been advancing endlessly together with convolutional neural networks and generative adversarial networks. Thereafter, it has been expanded to various fields such as image filing through guiding and image inpainting using various masking. Furthermore, the field termed image out-painting has also been pioneered. Meanwhile, after the recent announcement of the vision transformer, various computer vision problems have been attempted using the vision transformer. In this paper, we are trying to solve the problem of image generalization painting using the vision transformer. This is an attempt to fill images with painting no matter whether the areas where painting is missing are in or out of the images, and without guiding. To that end, the painting problem was defined as a problem to drop images in patch units for easy use in the vision transformer. And we solved the problem with a simple network structure created by slightly modifying the vision transformer to fit the problem. We named this network PIPformers. PIPformers achieved better values than other papers compared to PSNR, RMSE and SSIM.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Computer Animation and Virtual Worlds 工程技术-计算机：软件工程

CiteScore

2.20

自引率

0.00%

发文量

审稿时长

6-12 weeks

期刊介绍： With the advent of very powerful PCs and high-end graphics cards, there has been an incredible development in Virtual Worlds, real-time computer animation and simulation, games. But at the same time, new and cheaper Virtual Reality devices have appeared allowing an interaction with these real-time Virtual Worlds and even with real worlds through Augmented Reality. Three-dimensional characters, especially Virtual Humans are now of an exceptional quality, which allows to use them in the movie industry. But this is only a beginning, as with the development of Artificial Intelligence and Agent technology, these characters will become more and more autonomous and even intelligent. They will inhabit the Virtual Worlds in a Virtual Life together with animals and plants.