A Simple Plugin for Transforming Images to Arbitrary Scales

BMVC : proceedings of the British Machine Vision Conference. British Machine Vision Conference Pub Date : 2022-10-07 DOI:10.48550/arXiv.2210.03417

Qinye Zhou, Zi-Hua Li, Weidi Xie, Xiaoyun Zhang, Ya Zhang, Yanfeng Wang

{"title":"A Simple Plugin for Transforming Images to Arbitrary Scales","authors":"Qinye Zhou, Zi-Hua Li, Weidi Xie, Xiaoyun Zhang, Ya Zhang, Yanfeng Wang","doi":"10.48550/arXiv.2210.03417","DOIUrl":null,"url":null,"abstract":"Existing models on super-resolution often specialized for one scale, fundamentally limiting their use in practical scenarios. In this paper, we aim to develop a general plugin that can be inserted into existing super-resolution models, conveniently augmenting their ability towards Arbitrary Resolution Image Scaling, thus termed ARIS. We make the following contributions: (i) we propose a transformer-based plugin module, which uses spatial coordinates as query, iteratively attend the low-resolution image feature through cross-attention, and output visual feature for the queried spatial location, resembling an implicit representation for images; (ii) we introduce a novel self-supervised training scheme, that exploits consistency constraints to effectively augment the model's ability for upsampling images towards unseen scales, i.e. ground-truth high-resolution images are not available; (iii) without loss of generality, we inject the proposed ARIS plugin module into several existing models, namely, IPT, SwinIR, and HAT, showing that the resulting models can not only maintain their original performance on fixed scale factor but also extrapolate to unseen scales, substantially outperforming existing any-scale super-resolution models on standard benchmarks, e.g. Urban100, DIV2K, etc.","PeriodicalId":72437,"journal":{"name":"BMVC : proceedings of the British Machine Vision Conference. British Machine Vision Conference","volume":"25 1","pages":"107"},"PeriodicalIF":0.0000,"publicationDate":"2022-10-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"BMVC : proceedings of the British Machine Vision Conference. British Machine Vision Conference","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.48550/arXiv.2210.03417","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 1

Abstract

Existing models on super-resolution often specialized for one scale, fundamentally limiting their use in practical scenarios. In this paper, we aim to develop a general plugin that can be inserted into existing super-resolution models, conveniently augmenting their ability towards Arbitrary Resolution Image Scaling, thus termed ARIS. We make the following contributions: (i) we propose a transformer-based plugin module, which uses spatial coordinates as query, iteratively attend the low-resolution image feature through cross-attention, and output visual feature for the queried spatial location, resembling an implicit representation for images; (ii) we introduce a novel self-supervised training scheme, that exploits consistency constraints to effectively augment the model's ability for upsampling images towards unseen scales, i.e. ground-truth high-resolution images are not available; (iii) without loss of generality, we inject the proposed ARIS plugin module into several existing models, namely, IPT, SwinIR, and HAT, showing that the resulting models can not only maintain their original performance on fixed scale factor but also extrapolate to unseen scales, substantially outperforming existing any-scale super-resolution models on standard benchmarks, e.g. Urban100, DIV2K, etc.

查看原文本刊更多论文

一个简单的插件转换图像到任意尺度

现有的超分辨率模型通常专门用于一个尺度，从根本上限制了它们在实际场景中的应用。在本文中，我们的目标是开发一个通用插件，可以插入到现有的超分辨率模型中，方便地增强其对任意分辨率图像缩放的能力，因此称为ARIS。本文的贡献如下:(1)提出了一种基于变压器的插件模块，该模块以空间坐标为查询对象，通过交叉关注的方式迭代关注低分辨率图像特征，并输出被查询空间位置的视觉特征，类似于图像的隐式表示;(ii)我们引入了一种新的自监督训练方案，该方案利用一致性约束有效地增强了模型对未知尺度上采样图像的能力，即无法获得真实的高分辨率图像;(iii)在不损失通用性的前提下，我们将提出的ARIS插件模块注入到现有的几个模型中，即IPT、SwinIR和HAT，结果表明所得到的模型不仅可以在固定的尺度因子上保持其原始性能，而且可以外推到未知的尺度上，在标准基准上大大优于现有的任意尺度超分辨率模型，例如Urban100、DIV2K等。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

BMVC : proceedings of the British Machine Vision Conference. British Machine Vision Conference

自引率

0.00%

发文量