Transformer-based image and video inpainting: current challenges and future directions

IF 10.7 2区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE
Omar Elharrouss, Rafat Damseh, Abdelkader Nasreddine Belkacem, Elarbi Badidi, Abderrahmane Lakas
{"title":"Transformer-based image and video inpainting: current challenges and future directions","authors":"Omar Elharrouss,&nbsp;Rafat Damseh,&nbsp;Abdelkader Nasreddine Belkacem,&nbsp;Elarbi Badidi,&nbsp;Abderrahmane Lakas","doi":"10.1007/s10462-024-11075-9","DOIUrl":null,"url":null,"abstract":"<div><p>Image inpainting is currently a hot topic within the field of computer vision. It offers a viable solution for various applications, including photographic restoration, video editing, and medical imaging. Deep learning advancements, notably convolutional neural networks (CNNs) and generative adversarial networks (GANs), have significantly enhanced the inpainting task with an improved capability to fill missing or damaged regions in an image or a video through the incorporation of contextually appropriate details. These advancements have improved other aspects, including efficiency, information preservation, and achieving both realistic textures and structures. Recently, Vision Transformers (ViTs) have been exploited and offer some improvements to image or video inpainting. The advent of transformer-based architectures, which were initially designed for natural language processing, has also been integrated into computer vision tasks. These methods utilize self-attention mechanisms that excel in capturing long-range dependencies within data; therefore, they are particularly effective for tasks requiring a comprehensive understanding of the global context of an image or video. In this paper, we provide a comprehensive review of the current image/video inpainting approaches, with a specific focus on Vision Transformer (ViT) techniques, with the goal to highlight the significant improvements and provide a guideline for new researchers in the field of image/video inpainting using vision transformers. We categorized the transformer-based techniques by their architectural configurations, types of damage, and performance metrics. Furthermore, we present an organized synthesis of the current challenges, and suggest directions for future research in the field of image or video inpainting.</p></div>","PeriodicalId":8449,"journal":{"name":"Artificial Intelligence Review","volume":"58 4","pages":""},"PeriodicalIF":10.7000,"publicationDate":"2025-02-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://link.springer.com/content/pdf/10.1007/s10462-024-11075-9.pdf","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Artificial Intelligence Review","FirstCategoryId":"94","ListUrlMain":"https://link.springer.com/article/10.1007/s10462-024-11075-9","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 0

Abstract

Image inpainting is currently a hot topic within the field of computer vision. It offers a viable solution for various applications, including photographic restoration, video editing, and medical imaging. Deep learning advancements, notably convolutional neural networks (CNNs) and generative adversarial networks (GANs), have significantly enhanced the inpainting task with an improved capability to fill missing or damaged regions in an image or a video through the incorporation of contextually appropriate details. These advancements have improved other aspects, including efficiency, information preservation, and achieving both realistic textures and structures. Recently, Vision Transformers (ViTs) have been exploited and offer some improvements to image or video inpainting. The advent of transformer-based architectures, which were initially designed for natural language processing, has also been integrated into computer vision tasks. These methods utilize self-attention mechanisms that excel in capturing long-range dependencies within data; therefore, they are particularly effective for tasks requiring a comprehensive understanding of the global context of an image or video. In this paper, we provide a comprehensive review of the current image/video inpainting approaches, with a specific focus on Vision Transformer (ViT) techniques, with the goal to highlight the significant improvements and provide a guideline for new researchers in the field of image/video inpainting using vision transformers. We categorized the transformer-based techniques by their architectural configurations, types of damage, and performance metrics. Furthermore, we present an organized synthesis of the current challenges, and suggest directions for future research in the field of image or video inpainting.

基于变形金刚的图像和视频绘画:当前的挑战和未来的方向
图像着色是当前计算机视觉领域的热门话题。它为各种应用提供了可行的解决方案,包括照片修复、视频编辑和医学成像。深度学习的进步,特别是卷积神经网络(CNN)和生成对抗网络(GAN)的进步,大大增强了内绘任务的能力,通过结合上下文的适当细节,填补图像或视频中缺失或损坏的区域。这些进步还改善了其他方面,包括效率、信息保存以及实现逼真的纹理和结构。最近,视觉变换器(ViTs)得到了利用,并为图像或视频的绘制提供了一些改进。基于变换器的架构最初是为自然语言处理而设计的,这种架构的出现也融入了计算机视觉任务中。这些方法利用自我关注机制,擅长捕捉数据中的长距离依赖关系;因此,它们对于需要全面了解图像或视频全局背景的任务特别有效。在本文中,我们对当前的图像/视频绘制方法进行了全面回顾,并特别关注了视觉变换器(ViT)技术,目的是强调这些方法的显著改进,并为使用视觉变换器进行图像/视频绘制领域的新研究人员提供指导。我们按照架构配置、损坏类型和性能指标对基于变换器的技术进行了分类。此外,我们还对当前面临的挑战进行了有条理的总结,并提出了图像或视频着色领域的未来研究方向。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
Artificial Intelligence Review
Artificial Intelligence Review 工程技术-计算机:人工智能
CiteScore
22.00
自引率
3.30%
发文量
194
审稿时长
5.3 months
期刊介绍: Artificial Intelligence Review, a fully open access journal, publishes cutting-edge research in artificial intelligence and cognitive science. It features critical evaluations of applications, techniques, and algorithms, providing a platform for both researchers and application developers. The journal includes refereed survey and tutorial articles, along with reviews and commentary on significant developments in the field.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信