Fast Spatial-Temporal Transformer Network

2021 34th SIBGRAPI Conference on Graphics, Patterns and Images (SIBGRAPI) Pub Date : 2021-10-01 DOI:10.1109/sibgrapi54419.2021.00018

R. Escher, Rodrigo Andrade de Bem, P. L. J. Drews

{"title":"Fast Spatial-Temporal Transformer Network","authors":"R. Escher, Rodrigo Andrade de Bem, P. L. J. Drews","doi":"10.1109/sibgrapi54419.2021.00018","DOIUrl":null,"url":null,"abstract":"In computer vision, the restoration of missing regions in an image can be tackled with image inpainting techniques. Neural networks that perform inpainting in videos require the extraction of information from neighboring frames to obtain a temporally coherent result. The state-of-the-art methods for video inpainting are mainly based on Transformer Networks, which rely on attention mechanisms to handle temporal input data. However, such networks are highly costly, requiring considerable computational power for training and testing, which hinders its use on modest computing platforms. In this context, our goal is to reduce the computational complexity of state-of-the-art video inpainting methods, improving performance and facilitating its use in low-end GPUs. Therefore, we introduce the Fast Spatio-Temporal Transformer Network (FastSTTN), an extension of the Spatio-Temporal Transformer Network (STTN) in which the adoption of Reversible Layers reduces memory usage up to 7 times and execution time by approximately 2.2 times, while maintaining state-of-the-art video inpainting accuracy.","PeriodicalId":197423,"journal":{"name":"2021 34th SIBGRAPI Conference on Graphics, Patterns and Images (SIBGRAPI)","volume":"150 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 34th SIBGRAPI Conference on Graphics, Patterns and Images (SIBGRAPI)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/sibgrapi54419.2021.00018","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 1

Abstract

In computer vision, the restoration of missing regions in an image can be tackled with image inpainting techniques. Neural networks that perform inpainting in videos require the extraction of information from neighboring frames to obtain a temporally coherent result. The state-of-the-art methods for video inpainting are mainly based on Transformer Networks, which rely on attention mechanisms to handle temporal input data. However, such networks are highly costly, requiring considerable computational power for training and testing, which hinders its use on modest computing platforms. In this context, our goal is to reduce the computational complexity of state-of-the-art video inpainting methods, improving performance and facilitating its use in low-end GPUs. Therefore, we introduce the Fast Spatio-Temporal Transformer Network (FastSTTN), an extension of the Spatio-Temporal Transformer Network (STTN) in which the adoption of Reversible Layers reduces memory usage up to 7 times and execution time by approximately 2.2 times, while maintaining state-of-the-art video inpainting accuracy.

查看原文本刊更多论文

快速时空变压器网络

在计算机视觉中，图像缺失区域的恢复可以用图像补漆技术来解决。在视频中执行补画的神经网络需要从相邻帧中提取信息以获得时间一致的结果。目前最先进的视频绘制方法主要基于变压器网络，它依赖于注意力机制来处理时间输入数据。然而，这样的网络非常昂贵，需要相当大的计算能力来进行训练和测试，这阻碍了它在适度的计算平台上的使用。在这种情况下，我们的目标是降低最先进的视频绘制方法的计算复杂性，提高性能并促进其在低端gpu中的使用。因此，我们引入了快速时空变压器网络(FastSTTN)，这是时空变压器网络(STTN)的扩展，其中采用可逆层可将内存使用量减少7倍，执行时间减少约2.2倍，同时保持最先进的视频绘制精度。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2021 34th SIBGRAPI Conference on Graphics, Patterns and Images (SIBGRAPI)

自引率

0.00%

发文量