Efficient Feature Extraction for High-resolution Video Frame Interpolation

BMVC : proceedings of the British Machine Vision Conference. British Machine Vision Conference Pub Date : 2022-11-25 DOI:10.48550/arXiv.2211.14005

M. Nottebaum, S. Roth, Simone Schaub-Meyer

{"title":"Efficient Feature Extraction for High-resolution Video Frame Interpolation","authors":"M. Nottebaum, S. Roth, Simone Schaub-Meyer","doi":"10.48550/arXiv.2211.14005","DOIUrl":null,"url":null,"abstract":"Most deep learning methods for video frame interpolation consist of three main components: feature extraction, motion estimation, and image synthesis. Existing approaches are mainly distinguishable in terms of how these modules are designed. However, when interpolating high-resolution images, e.g. at 4K, the design choices for achieving high accuracy within reasonable memory requirements are limited. The feature extraction layers help to compress the input and extract relevant information for the latter stages, such as motion estimation. However, these layers are often costly in parameters, computation time, and memory. We show how ideas from dimensionality reduction combined with a lightweight optimization can be used to compress the input representation while keeping the extracted information suitable for frame interpolation. Further, we require neither a pretrained flow network nor a synthesis network, additionally reducing the number of trainable parameters and required memory. When evaluating on three 4K benchmarks, we achieve state-of-the-art image quality among the methods without pretrained flow while having the lowest network complexity and memory requirements overall.","PeriodicalId":72437,"journal":{"name":"BMVC : proceedings of the British Machine Vision Conference. British Machine Vision Conference","volume":"24 1","pages":"825"},"PeriodicalIF":0.0000,"publicationDate":"2022-11-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"BMVC : proceedings of the British Machine Vision Conference. British Machine Vision Conference","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.48550/arXiv.2211.14005","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 1

Abstract

Most deep learning methods for video frame interpolation consist of three main components: feature extraction, motion estimation, and image synthesis. Existing approaches are mainly distinguishable in terms of how these modules are designed. However, when interpolating high-resolution images, e.g. at 4K, the design choices for achieving high accuracy within reasonable memory requirements are limited. The feature extraction layers help to compress the input and extract relevant information for the latter stages, such as motion estimation. However, these layers are often costly in parameters, computation time, and memory. We show how ideas from dimensionality reduction combined with a lightweight optimization can be used to compress the input representation while keeping the extracted information suitable for frame interpolation. Further, we require neither a pretrained flow network nor a synthesis network, additionally reducing the number of trainable parameters and required memory. When evaluating on three 4K benchmarks, we achieve state-of-the-art image quality among the methods without pretrained flow while having the lowest network complexity and memory requirements overall.

查看原文本刊更多论文

高分辨率视频帧插值的高效特征提取

大多数视频帧插值的深度学习方法包括三个主要部分:特征提取、运动估计和图像合成。现有方法的主要区别在于如何设计这些模块。然而，当插值高分辨率图像时，例如4K，在合理的内存要求下实现高精度的设计选择是有限的。特征提取层有助于压缩输入并提取后期阶段的相关信息，例如运动估计。然而，这些层通常在参数、计算时间和内存方面代价高昂。我们展示了如何使用降维与轻量级优化相结合的思想来压缩输入表示，同时保持提取的信息适合帧插值。此外，我们既不需要预训练的流网络，也不需要合成网络，另外减少了可训练参数的数量和所需的内存。在三个4K基准测试上进行评估时，我们在没有预训练流的方法中获得了最先进的图像质量，同时具有最低的网络复杂性和总体内存要求。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

BMVC : proceedings of the British Machine Vision Conference. British Machine Vision Conference

自引率

0.00%

发文量