Yang Zhou;Liangtian He;Liang-Jian Deng;Hongming Chen;Chao Wang
{"title":"MDFormer:基于多尺度下采样的弱光图像增强变压器","authors":"Yang Zhou;Liangtian He;Liang-Jian Deng;Hongming Chen;Chao Wang","doi":"10.1109/LSP.2025.3556786","DOIUrl":null,"url":null,"abstract":"Vision Transformers have achieved impressive performance in the field of low-light image enhancement. Some Transformer-based methods acquire attention maps within channel dimension, whereas the spatial resolutions of queries and keys involved in matrix multiplication are much larger than the dimensions of channels. During the key-query dot-product interaction to generate attention maps, massive information redundancy and expensive computational costs are incurred. Simultaneously, most previous feed-forward networks in Transformers do not model the multi-range information that plays an important role for feature reconstruction. Based on the above observations, we propose an effective Multi-Scale Downsampling-Based Transformer (MDFormer) for low-light image enhancement, which consists of multi-scale downsampling-based self-attention (MDSA) and multi-range gated extraction block (MGEB). MDSA employs downsampling with two different factors for queries and keys to save the computational cost when implementing self-attention operations within channel dimension. Furthermore, we introduce learnable parameters for the two generated attention maps to adjust the weights for fusion, which allows MDSA to adaptively retain the most significant attention scores from attention maps. The proposed MGEB captures multi-range information by virtue of the multi-scale depth-wise convolutions and dilated convolutions, to enhance modeling capabilities. Extensive experiments on four challenging low-light image enhancement datasets demonstrate that our method outperforms the state-of-the-art.","PeriodicalId":13154,"journal":{"name":"IEEE Signal Processing Letters","volume":"32 ","pages":"1575-1579"},"PeriodicalIF":3.2000,"publicationDate":"2025-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"MDFormer: Multi-Scale Downsampling-Based Transformer for Low-Light Image Enhancement\",\"authors\":\"Yang Zhou;Liangtian He;Liang-Jian Deng;Hongming Chen;Chao Wang\",\"doi\":\"10.1109/LSP.2025.3556786\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Vision Transformers have achieved impressive performance in the field of low-light image enhancement. Some Transformer-based methods acquire attention maps within channel dimension, whereas the spatial resolutions of queries and keys involved in matrix multiplication are much larger than the dimensions of channels. During the key-query dot-product interaction to generate attention maps, massive information redundancy and expensive computational costs are incurred. Simultaneously, most previous feed-forward networks in Transformers do not model the multi-range information that plays an important role for feature reconstruction. Based on the above observations, we propose an effective Multi-Scale Downsampling-Based Transformer (MDFormer) for low-light image enhancement, which consists of multi-scale downsampling-based self-attention (MDSA) and multi-range gated extraction block (MGEB). MDSA employs downsampling with two different factors for queries and keys to save the computational cost when implementing self-attention operations within channel dimension. Furthermore, we introduce learnable parameters for the two generated attention maps to adjust the weights for fusion, which allows MDSA to adaptively retain the most significant attention scores from attention maps. The proposed MGEB captures multi-range information by virtue of the multi-scale depth-wise convolutions and dilated convolutions, to enhance modeling capabilities. Extensive experiments on four challenging low-light image enhancement datasets demonstrate that our method outperforms the state-of-the-art.\",\"PeriodicalId\":13154,\"journal\":{\"name\":\"IEEE Signal Processing Letters\",\"volume\":\"32 \",\"pages\":\"1575-1579\"},\"PeriodicalIF\":3.2000,\"publicationDate\":\"2025-04-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"IEEE Signal Processing Letters\",\"FirstCategoryId\":\"5\",\"ListUrlMain\":\"https://ieeexplore.ieee.org/document/10946846/\",\"RegionNum\":2,\"RegionCategory\":\"工程技术\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"ENGINEERING, ELECTRICAL & ELECTRONIC\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Signal Processing Letters","FirstCategoryId":"5","ListUrlMain":"https://ieeexplore.ieee.org/document/10946846/","RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"ENGINEERING, ELECTRICAL & ELECTRONIC","Score":null,"Total":0}
MDFormer: Multi-Scale Downsampling-Based Transformer for Low-Light Image Enhancement
Vision Transformers have achieved impressive performance in the field of low-light image enhancement. Some Transformer-based methods acquire attention maps within channel dimension, whereas the spatial resolutions of queries and keys involved in matrix multiplication are much larger than the dimensions of channels. During the key-query dot-product interaction to generate attention maps, massive information redundancy and expensive computational costs are incurred. Simultaneously, most previous feed-forward networks in Transformers do not model the multi-range information that plays an important role for feature reconstruction. Based on the above observations, we propose an effective Multi-Scale Downsampling-Based Transformer (MDFormer) for low-light image enhancement, which consists of multi-scale downsampling-based self-attention (MDSA) and multi-range gated extraction block (MGEB). MDSA employs downsampling with two different factors for queries and keys to save the computational cost when implementing self-attention operations within channel dimension. Furthermore, we introduce learnable parameters for the two generated attention maps to adjust the weights for fusion, which allows MDSA to adaptively retain the most significant attention scores from attention maps. The proposed MGEB captures multi-range information by virtue of the multi-scale depth-wise convolutions and dilated convolutions, to enhance modeling capabilities. Extensive experiments on four challenging low-light image enhancement datasets demonstrate that our method outperforms the state-of-the-art.
期刊介绍:
The IEEE Signal Processing Letters is a monthly, archival publication designed to provide rapid dissemination of original, cutting-edge ideas and timely, significant contributions in signal, image, speech, language and audio processing. Papers published in the Letters can be presented within one year of their appearance in signal processing conferences such as ICASSP, GlobalSIP and ICIP, and also in several workshop organized by the Signal Processing Society.