Yanheng Lv, Lulu Pan, Ke Xu, Guo Li, Wenbo Zhang, Lingxiao Li, Le Lei
{"title":"Enhanced local multi-windows attention network for lightweight image super-resolution","authors":"Yanheng Lv, Lulu Pan, Ke Xu, Guo Li, Wenbo Zhang, Lingxiao Li, Le Lei","doi":"10.1016/j.cviu.2024.104217","DOIUrl":null,"url":null,"abstract":"<div><div>Since the global self-attention mechanism can capture long-distance dependencies well, Transformer-based methods have achieved remarkable performance in many vision tasks, including single-image super-resolution (SISR). However, there are strong local self-similarities in images, if the global self-attention mechanism is still used for image processing, it may lead to excessive use of computing resources on parts of the image with weak correlation. Especially in the high-resolution large-size image, the global self-attention will lead to a large number of redundant calculations. To solve this problem, we propose the Enhanced Local Multi-windows Attention Network (ELMA), which contains two main designs. First, different from the traditional self-attention based on square window partition, we propose a Multi-windows Self-Attention (M-WSA) which uses a new window partitioning mechanism to obtain different types of local long-distance dependencies. Compared with original self-attention mechanisms commonly used in other SR networks, M-WSA reduces computational complexity and achieves superior performance through analysis and experiments. Secondly, we propose a Spatial Gated Network (SGN) as a feed-forward network, which can effectively overcome the problem of intermediate channel redundancy in traditional MLP, thereby improving the parameter utilization and computational efficiency of the network. Meanwhile, SGN introduces spatial information into the feed-forward network that traditional MLP cannot obtain. It can better understand and use the spatial structure information in the image, and enhances the network performance and generalization ability. Extensive experiments show that ELMA achieves competitive performance compared to state-of-the-art methods while maintaining fewer parameters and computational costs.</div></div>","PeriodicalId":50633,"journal":{"name":"Computer Vision and Image Understanding","volume":"250 ","pages":"Article 104217"},"PeriodicalIF":4.3000,"publicationDate":"2024-11-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Computer Vision and Image Understanding","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S1077314224002984","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 0
Abstract
Since the global self-attention mechanism can capture long-distance dependencies well, Transformer-based methods have achieved remarkable performance in many vision tasks, including single-image super-resolution (SISR). However, there are strong local self-similarities in images, if the global self-attention mechanism is still used for image processing, it may lead to excessive use of computing resources on parts of the image with weak correlation. Especially in the high-resolution large-size image, the global self-attention will lead to a large number of redundant calculations. To solve this problem, we propose the Enhanced Local Multi-windows Attention Network (ELMA), which contains two main designs. First, different from the traditional self-attention based on square window partition, we propose a Multi-windows Self-Attention (M-WSA) which uses a new window partitioning mechanism to obtain different types of local long-distance dependencies. Compared with original self-attention mechanisms commonly used in other SR networks, M-WSA reduces computational complexity and achieves superior performance through analysis and experiments. Secondly, we propose a Spatial Gated Network (SGN) as a feed-forward network, which can effectively overcome the problem of intermediate channel redundancy in traditional MLP, thereby improving the parameter utilization and computational efficiency of the network. Meanwhile, SGN introduces spatial information into the feed-forward network that traditional MLP cannot obtain. It can better understand and use the spatial structure information in the image, and enhances the network performance and generalization ability. Extensive experiments show that ELMA achieves competitive performance compared to state-of-the-art methods while maintaining fewer parameters and computational costs.
期刊介绍:
The central focus of this journal is the computer analysis of pictorial information. Computer Vision and Image Understanding publishes papers covering all aspects of image analysis from the low-level, iconic processes of early vision to the high-level, symbolic processes of recognition and interpretation. A wide range of topics in the image understanding area is covered, including papers offering insights that differ from predominant views.
Research Areas Include:
• Theory
• Early vision
• Data structures and representations
• Shape
• Range
• Motion
• Matching and recognition
• Architecture and languages
• Vision systems