Fast Spatial-Temporal Transformer Network

R. Escher, Rodrigo Andrade de Bem, P. L. J. Drews
{"title":"Fast Spatial-Temporal Transformer Network","authors":"R. Escher, Rodrigo Andrade de Bem, P. L. J. Drews","doi":"10.1109/sibgrapi54419.2021.00018","DOIUrl":null,"url":null,"abstract":"In computer vision, the restoration of missing regions in an image can be tackled with image inpainting techniques. Neural networks that perform inpainting in videos require the extraction of information from neighboring frames to obtain a temporally coherent result. The state-of-the-art methods for video inpainting are mainly based on Transformer Networks, which rely on attention mechanisms to handle temporal input data. However, such networks are highly costly, requiring considerable computational power for training and testing, which hinders its use on modest computing platforms. In this context, our goal is to reduce the computational complexity of state-of-the-art video inpainting methods, improving performance and facilitating its use in low-end GPUs. Therefore, we introduce the Fast Spatio-Temporal Transformer Network (FastSTTN), an extension of the Spatio-Temporal Transformer Network (STTN) in which the adoption of Reversible Layers reduces memory usage up to 7 times and execution time by approximately 2.2 times, while maintaining state-of-the-art video inpainting accuracy.","PeriodicalId":197423,"journal":{"name":"2021 34th SIBGRAPI Conference on Graphics, Patterns and Images (SIBGRAPI)","volume":"150 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 34th SIBGRAPI Conference on Graphics, Patterns and Images (SIBGRAPI)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/sibgrapi54419.2021.00018","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1

Abstract

In computer vision, the restoration of missing regions in an image can be tackled with image inpainting techniques. Neural networks that perform inpainting in videos require the extraction of information from neighboring frames to obtain a temporally coherent result. The state-of-the-art methods for video inpainting are mainly based on Transformer Networks, which rely on attention mechanisms to handle temporal input data. However, such networks are highly costly, requiring considerable computational power for training and testing, which hinders its use on modest computing platforms. In this context, our goal is to reduce the computational complexity of state-of-the-art video inpainting methods, improving performance and facilitating its use in low-end GPUs. Therefore, we introduce the Fast Spatio-Temporal Transformer Network (FastSTTN), an extension of the Spatio-Temporal Transformer Network (STTN) in which the adoption of Reversible Layers reduces memory usage up to 7 times and execution time by approximately 2.2 times, while maintaining state-of-the-art video inpainting accuracy.
快速时空变压器网络
在计算机视觉中,图像缺失区域的恢复可以用图像补漆技术来解决。在视频中执行补画的神经网络需要从相邻帧中提取信息以获得时间一致的结果。目前最先进的视频绘制方法主要基于变压器网络,它依赖于注意力机制来处理时间输入数据。然而,这样的网络非常昂贵,需要相当大的计算能力来进行训练和测试,这阻碍了它在适度的计算平台上的使用。在这种情况下,我们的目标是降低最先进的视频绘制方法的计算复杂性,提高性能并促进其在低端gpu中的使用。因此,我们引入了快速时空变压器网络(FastSTTN),这是时空变压器网络(STTN)的扩展,其中采用可逆层可将内存使用量减少7倍,执行时间减少约2.2倍,同时保持最先进的视频绘制精度。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信
小红书