Frequency-Assisted Mamba for Remote Sensing Image Super-Resolution

IF 8.4 1区计算机科学 Q1 COMPUTER SCIENCE, INFORMATION SYSTEMS

IEEE Transactions on Multimedia Pub Date : 2024-12-30 DOI:10.1109/TMM.2024.3521798

Yi Xiao;Qiangqiang Yuan;Kui Jiang;Yuzeng Chen;Qiang Zhang;Chia-Wen Lin

{"title":"Frequency-Assisted Mamba for Remote Sensing Image Super-Resolution","authors":"Yi Xiao;Qiangqiang Yuan;Kui Jiang;Yuzeng Chen;Qiang Zhang;Chia-Wen Lin","doi":"10.1109/TMM.2024.3521798","DOIUrl":null,"url":null,"abstract":"Recent progress in remote sensing image (RSI) super-resolution (SR) has exhibited remarkable performance using deep neural networks, e.g., Convolutional Neural Networks and Transformers. However, existing SR methods often suffer from either a limited receptive field or quadratic computational overhead, resulting in sub-optimal global representation and unacceptable computational costs in large-scale RSI. To alleviate these issues, we develop the first attempt to integrate the Vision State Space Model (Mamba) for RSI-SR, which specializes in processing large-scale RSI by capturing long-range dependency with linear complexity. To achieve better SR reconstruction, building upon Mamba, we devise a Frequency-assisted Mamba framework, dubbed FMSR, to explore the spatial and frequent correlations. In particular, our FMSR features a multi-level fusion architecture equipped with the Frequency Selection Module (FSM), Vision State Space Module (VSSM), and Hybrid Gate Module (HGM) to grasp their merits for effective spatial-frequency fusion. Considering that global and local dependencies are complementary and both beneficial for SR, we further recalibrate these multi-level features for accurate feature fusion via learnable scaling adaptors. Extensive experiments on AID, DOTA, and DIOR benchmarks demonstrate that our FMSR outperforms state-of-the-art Transformer-based methods HAT-L in terms of PSNR by 0.11 dB on average, while consuming only 28.05% and 19.08% of its memory consumption and complexity, respectively.","PeriodicalId":13273,"journal":{"name":"IEEE Transactions on Multimedia","volume":"27 ","pages":"1783-1796"},"PeriodicalIF":8.4000,"publicationDate":"2024-12-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Multimedia","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/10817590/","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}

引用次数: 0

Abstract

Recent progress in remote sensing image (RSI) super-resolution (SR) has exhibited remarkable performance using deep neural networks, e.g., Convolutional Neural Networks and Transformers. However, existing SR methods often suffer from either a limited receptive field or quadratic computational overhead, resulting in sub-optimal global representation and unacceptable computational costs in large-scale RSI. To alleviate these issues, we develop the first attempt to integrate the Vision State Space Model (Mamba) for RSI-SR, which specializes in processing large-scale RSI by capturing long-range dependency with linear complexity. To achieve better SR reconstruction, building upon Mamba, we devise a Frequency-assisted Mamba framework, dubbed FMSR, to explore the spatial and frequent correlations. In particular, our FMSR features a multi-level fusion architecture equipped with the Frequency Selection Module (FSM), Vision State Space Module (VSSM), and Hybrid Gate Module (HGM) to grasp their merits for effective spatial-frequency fusion. Considering that global and local dependencies are complementary and both beneficial for SR, we further recalibrate these multi-level features for accurate feature fusion via learnable scaling adaptors. Extensive experiments on AID, DOTA, and DIOR benchmarks demonstrate that our FMSR outperforms state-of-the-art Transformer-based methods HAT-L in terms of PSNR by 0.11 dB on average, while consuming only 28.05% and 19.08% of its memory consumption and complexity, respectively.

查看原文本刊更多论文

用于遥感图像超分辨率的频率辅助曼巴

近年来，利用深度神经网络，如卷积神经网络和变压器，在遥感图像（RSI）超分辨率（SR）方面取得了显著的进展。然而，现有的RSI方法经常受到有限的接受域或二次计算开销的影响，导致大规模RSI的全局表示不理想和计算成本不可接受。为了缓解这些问题，我们首次尝试将视觉状态空间模型（Mamba）集成到RSI- sr中，该模型专门通过捕获具有线性复杂性的远程依赖关系来处理大规模RSI。为了实现更好的SR重建，在曼巴的基础上，我们设计了一个频率辅助曼巴框架，称为FMSR，以探索空间和频率相关性。我们的FMSR具有多层融合架构，配备了频率选择模块（FSM），视觉状态空间模块（VSSM）和混合门模块（HGM），以掌握它们的优点，实现有效的空间频率融合。考虑到全局和局部依赖关系是互补的，并且都有利于SR，我们进一步重新校准这些多层次特征，以便通过可学习的缩放适配器进行准确的特征融合。在AID、DOTA和DIOR基准测试上的大量实验表明，我们的FMSR在PSNR方面平均优于最先进的基于变压器的HAT-L方法0.11 dB，而内存消耗和复杂度分别仅为其28.05%和19.08%。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

IEEE Transactions on Multimedia 工程技术-电信学

CiteScore

11.70

自引率

11.00%

发文量

576

审稿时长

5.5 months

期刊介绍： The IEEE Transactions on Multimedia delves into diverse aspects of multimedia technology and applications, covering circuits, networking, signal processing, systems, software, and systems integration. The scope aligns with the Fields of Interest of the sponsors, ensuring a comprehensive exploration of research in multimedia.