MDFormer：基于多尺度下采样的弱光图像增强变压器

IF 3.2 2区工程技术 Q2 ENGINEERING, ELECTRICAL & ELECTRONIC

IEEE Signal Processing Letters Pub Date : 2025-04-01 DOI:10.1109/LSP.2025.3556786

Yang Zhou;Liangtian He;Liang-Jian Deng;Hongming Chen;Chao Wang

{"title":"MDFormer：基于多尺度下采样的弱光图像增强变压器","authors":"Yang Zhou;Liangtian He;Liang-Jian Deng;Hongming Chen;Chao Wang","doi":"10.1109/LSP.2025.3556786","DOIUrl":null,"url":null,"abstract":"Vision Transformers have achieved impressive performance in the field of low-light image enhancement. Some Transformer-based methods acquire attention maps within channel dimension, whereas the spatial resolutions of queries and keys involved in matrix multiplication are much larger than the dimensions of channels. During the key-query dot-product interaction to generate attention maps, massive information redundancy and expensive computational costs are incurred. Simultaneously, most previous feed-forward networks in Transformers do not model the multi-range information that plays an important role for feature reconstruction. Based on the above observations, we propose an effective Multi-Scale Downsampling-Based Transformer (MDFormer) for low-light image enhancement, which consists of multi-scale downsampling-based self-attention (MDSA) and multi-range gated extraction block (MGEB). MDSA employs downsampling with two different factors for queries and keys to save the computational cost when implementing self-attention operations within channel dimension. Furthermore, we introduce learnable parameters for the two generated attention maps to adjust the weights for fusion, which allows MDSA to adaptively retain the most significant attention scores from attention maps. The proposed MGEB captures multi-range information by virtue of the multi-scale depth-wise convolutions and dilated convolutions, to enhance modeling capabilities. Extensive experiments on four challenging low-light image enhancement datasets demonstrate that our method outperforms the state-of-the-art.","PeriodicalId":13154,"journal":{"name":"IEEE Signal Processing Letters","volume":"32 ","pages":"1575-1579"},"PeriodicalIF":3.2000,"publicationDate":"2025-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"MDFormer: Multi-Scale Downsampling-Based Transformer for Low-Light Image Enhancement\",\"authors\":\"Yang Zhou;Liangtian He;Liang-Jian Deng;Hongming Chen;Chao Wang\",\"doi\":\"10.1109/LSP.2025.3556786\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Vision Transformers have achieved impressive performance in the field of low-light image enhancement. Some Transformer-based methods acquire attention maps within channel dimension, whereas the spatial resolutions of queries and keys involved in matrix multiplication are much larger than the dimensions of channels. During the key-query dot-product interaction to generate attention maps, massive information redundancy and expensive computational costs are incurred. Simultaneously, most previous feed-forward networks in Transformers do not model the multi-range information that plays an important role for feature reconstruction. Based on the above observations, we propose an effective Multi-Scale Downsampling-Based Transformer (MDFormer) for low-light image enhancement, which consists of multi-scale downsampling-based self-attention (MDSA) and multi-range gated extraction block (MGEB). MDSA employs downsampling with two different factors for queries and keys to save the computational cost when implementing self-attention operations within channel dimension. Furthermore, we introduce learnable parameters for the two generated attention maps to adjust the weights for fusion, which allows MDSA to adaptively retain the most significant attention scores from attention maps. The proposed MGEB captures multi-range information by virtue of the multi-scale depth-wise convolutions and dilated convolutions, to enhance modeling capabilities. Extensive experiments on four challenging low-light image enhancement datasets demonstrate that our method outperforms the state-of-the-art.\",\"PeriodicalId\":13154,\"journal\":{\"name\":\"IEEE Signal Processing Letters\",\"volume\":\"32 \",\"pages\":\"1575-1579\"},\"PeriodicalIF\":3.2000,\"publicationDate\":\"2025-04-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"IEEE Signal Processing Letters\",\"FirstCategoryId\":\"5\",\"ListUrlMain\":\"https://ieeexplore.ieee.org/document/10946846/\",\"RegionNum\":2,\"RegionCategory\":\"工程技术\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"ENGINEERING, ELECTRICAL & ELECTRONIC\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Signal Processing Letters","FirstCategoryId":"5","ListUrlMain":"https://ieeexplore.ieee.org/document/10946846/","RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"ENGINEERING, ELECTRICAL & ELECTRONIC","Score":null,"Total":0}

引用次数: 0

摘要

视觉变压器在微光图像增强领域取得了令人瞩目的成绩。一些基于transform的方法获取通道维度内的注意图，而矩阵乘法中涉及的查询和键的空间分辨率远远大于通道维度。在键-查询点积交互生成注意图的过程中，会产生大量的信息冗余和昂贵的计算成本。同时，以往的变压器前馈网络大多没有对特征重构中起重要作用的多距离信息进行建模。基于以上观察，我们提出了一种有效的基于多尺度下采样的弱光图像增强变压器（MDFormer），它由基于多尺度下采样的自关注（MDSA）和多范围门控提取块（MGEB）组成。MDSA在实现通道维度内的自关注操作时，对查询和键采用两种不同因子的降采样，从而节省了计算成本。此外，我们为生成的两个注意图引入可学习的参数来调整融合的权重，这使得MDSA自适应地保留了注意图中最重要的注意分数。MGEB利用多尺度深度卷积和扩展卷积捕获多范围信息，增强了建模能力。在四个具有挑战性的低光图像增强数据集上进行的大量实验表明，我们的方法优于最先进的方法。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

MDFormer: Multi-Scale Downsampling-Based Transformer for Low-Light Image Enhancement

Vision Transformers have achieved impressive performance in the field of low-light image enhancement. Some Transformer-based methods acquire attention maps within channel dimension, whereas the spatial resolutions of queries and keys involved in matrix multiplication are much larger than the dimensions of channels. During the key-query dot-product interaction to generate attention maps, massive information redundancy and expensive computational costs are incurred. Simultaneously, most previous feed-forward networks in Transformers do not model the multi-range information that plays an important role for feature reconstruction. Based on the above observations, we propose an effective Multi-Scale Downsampling-Based Transformer (MDFormer) for low-light image enhancement, which consists of multi-scale downsampling-based self-attention (MDSA) and multi-range gated extraction block (MGEB). MDSA employs downsampling with two different factors for queries and keys to save the computational cost when implementing self-attention operations within channel dimension. Furthermore, we introduce learnable parameters for the two generated attention maps to adjust the weights for fusion, which allows MDSA to adaptively retain the most significant attention scores from attention maps. The proposed MGEB captures multi-range information by virtue of the multi-scale depth-wise convolutions and dilated convolutions, to enhance modeling capabilities. Extensive experiments on four challenging low-light image enhancement datasets demonstrate that our method outperforms the state-of-the-art.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

IEEE Signal Processing Letters 工程技术-工程：电子与电气

CiteScore

7.40

自引率

12.80%

发文量

339

审稿时长

2.8 months

期刊介绍： The IEEE Signal Processing Letters is a monthly, archival publication designed to provide rapid dissemination of original, cutting-edge ideas and timely, significant contributions in signal, image, speech, language and audio processing. Papers published in the Letters can be presented within one year of their appearance in signal processing conferences such as ICASSP, GlobalSIP and ICIP, and also in several workshop organized by the Signal Processing Society.