{"title":"Local Gaussian ensemble for arbitrary-scale image super-resolution","authors":"Chuan Chen, Weiwei Wang, Xixi Jia, Xiangchu Feng, Hanjia Wei","doi":"10.1016/j.cviu.2025.104372","DOIUrl":null,"url":null,"abstract":"<div><div>In arbitrary-scale image super-resolution (SR), the local coordinate information is pivotal to enhancing performance through local ensemble. The previous method local implicit image function (LIIF) reconstructs pixels by using multi-layer perceptron (MLP), then refines each pixel by a weighted summation of nearby pixels (also called local ensemble), where the weight depends on the distances between the query pixel and the nearby pixels. Since the distances are fixed, so is the weighting mechanism, limiting the effectiveness of local ensemble. Furthermore, the weighted summation involves repeated reconstructions, increasing the computational cost. Orthogonal position encoding SR (OPE-SR) reduces pixel reconstruction complexity using orthogonal position encoding. However, it still relies on LIIF’s local ensemble method. Additionally, lacking scale information, OPE-SR demonstrates unstable performance across various datasets and scale factors. In this paper, we propose to conduct local ensemble in feature domain, and we present a new ensemble method, the local Gaussian ensemble (LGE), to utilize the local coordinate information more flexibly and efficiently. Specifically, we introduce learnable anisotropic 2D Gaussians for each query coordinate in the SR image, transforming normalized coordinates of nearby features into multiple Gaussian weights to effectively ensemble local features. Then a scale-aware deep MLP is applied only once for pixel reconstruction. Extensive experiments demonstrate that our LGE significantly reduces computational costs during both training and inference while delivering performance comparable to the existing local ensemble method. Moreover, our method consistently outperforms the existing parameter-free approach in terms of efficiency and stability across various benchmark datasets and scale factors.</div></div>","PeriodicalId":50633,"journal":{"name":"Computer Vision and Image Understanding","volume":"257 ","pages":"Article 104372"},"PeriodicalIF":4.3000,"publicationDate":"2025-04-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Computer Vision and Image Understanding","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S1077314225000955","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 0
Abstract
In arbitrary-scale image super-resolution (SR), the local coordinate information is pivotal to enhancing performance through local ensemble. The previous method local implicit image function (LIIF) reconstructs pixels by using multi-layer perceptron (MLP), then refines each pixel by a weighted summation of nearby pixels (also called local ensemble), where the weight depends on the distances between the query pixel and the nearby pixels. Since the distances are fixed, so is the weighting mechanism, limiting the effectiveness of local ensemble. Furthermore, the weighted summation involves repeated reconstructions, increasing the computational cost. Orthogonal position encoding SR (OPE-SR) reduces pixel reconstruction complexity using orthogonal position encoding. However, it still relies on LIIF’s local ensemble method. Additionally, lacking scale information, OPE-SR demonstrates unstable performance across various datasets and scale factors. In this paper, we propose to conduct local ensemble in feature domain, and we present a new ensemble method, the local Gaussian ensemble (LGE), to utilize the local coordinate information more flexibly and efficiently. Specifically, we introduce learnable anisotropic 2D Gaussians for each query coordinate in the SR image, transforming normalized coordinates of nearby features into multiple Gaussian weights to effectively ensemble local features. Then a scale-aware deep MLP is applied only once for pixel reconstruction. Extensive experiments demonstrate that our LGE significantly reduces computational costs during both training and inference while delivering performance comparable to the existing local ensemble method. Moreover, our method consistently outperforms the existing parameter-free approach in terms of efficiency and stability across various benchmark datasets and scale factors.
期刊介绍:
The central focus of this journal is the computer analysis of pictorial information. Computer Vision and Image Understanding publishes papers covering all aspects of image analysis from the low-level, iconic processes of early vision to the high-level, symbolic processes of recognition and interpretation. A wide range of topics in the image understanding area is covered, including papers offering insights that differ from predominant views.
Research Areas Include:
• Theory
• Early vision
• Data structures and representations
• Shape
• Range
• Motion
• Matching and recognition
• Architecture and languages
• Vision systems