基于尺度可变自适应采样的图像压缩感知与混合注意力转换器重构

IF 9.7 1区计算机科学 Q1 COMPUTER SCIENCE, INFORMATION SYSTEMS

IEEE Transactions on Multimedia Pub Date : 2025-03-03 DOI:10.1109/TMM.2025.3535114

Chen Hui;Debin Zhao;Weisi Lin;Shaohui Liu;Feng Jiang

{"title":"基于尺度可变自适应采样的图像压缩感知与混合注意力转换器重构","authors":"Chen Hui;Debin Zhao;Weisi Lin;Shaohui Liu;Feng Jiang","doi":"10.1109/TMM.2025.3535114","DOIUrl":null,"url":null,"abstract":"Recently, a large number of image compressive sensing (CS) methods with deep unfolding networks (DUNs) have been proposed. However, existing methods either use fixed-scale blocks for sampling that leads to limited insights into the image content or employ a plain convolutional neural network (CNN) in each iteration that weakens the perception of broader contextual prior. In this paper, we propose a novel DUN (dubbed SVASNet) for image compressive sensing, which achieves scale-variable adaptive sampling and hybrid-attention Transformer reconstruction with a single model. Specifically, for scale-variable sampling, a sampling matrix-based calculator is first employed to evaluate the reconstruction distortion, which only requires measurements without access to the ground truth image. Then, a Block Scale Aggregation (BSA) strategy is presented to compute the reconstruction distortion under block divisions at different scales and select the optimal division scale for sampling. To realize hybrid-attention reconstruction, a dual Cross Attention (CA) submodule in the gradient descent step and a Spatial Attention (SA) submodule in the proximal mapping step are developed. The CA submodule introduces inter-phase inertial forces in the gradient descent, which improves the memory effect between adjacent iterations. The SA submodule integrates local and global prior representations of CNN and Transformer, and explores local and global affinities between dense feature representations. Extensive experimental results show that the proposed SVASNet achieves significant improvements over the state-of-the-art methods.","PeriodicalId":13273,"journal":{"name":"IEEE Transactions on Multimedia","volume":"27 ","pages":"4333-4347"},"PeriodicalIF":9.7000,"publicationDate":"2025-03-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Image Compressive Sensing With Scale-Variable Adaptive Sampling and Hybrid-Attention Transformer Reconstruction\",\"authors\":\"Chen Hui;Debin Zhao;Weisi Lin;Shaohui Liu;Feng Jiang\",\"doi\":\"10.1109/TMM.2025.3535114\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Recently, a large number of image compressive sensing (CS) methods with deep unfolding networks (DUNs) have been proposed. However, existing methods either use fixed-scale blocks for sampling that leads to limited insights into the image content or employ a plain convolutional neural network (CNN) in each iteration that weakens the perception of broader contextual prior. In this paper, we propose a novel DUN (dubbed SVASNet) for image compressive sensing, which achieves scale-variable adaptive sampling and hybrid-attention Transformer reconstruction with a single model. Specifically, for scale-variable sampling, a sampling matrix-based calculator is first employed to evaluate the reconstruction distortion, which only requires measurements without access to the ground truth image. Then, a Block Scale Aggregation (BSA) strategy is presented to compute the reconstruction distortion under block divisions at different scales and select the optimal division scale for sampling. To realize hybrid-attention reconstruction, a dual Cross Attention (CA) submodule in the gradient descent step and a Spatial Attention (SA) submodule in the proximal mapping step are developed. The CA submodule introduces inter-phase inertial forces in the gradient descent, which improves the memory effect between adjacent iterations. The SA submodule integrates local and global prior representations of CNN and Transformer, and explores local and global affinities between dense feature representations. Extensive experimental results show that the proposed SVASNet achieves significant improvements over the state-of-the-art methods.\",\"PeriodicalId\":13273,\"journal\":{\"name\":\"IEEE Transactions on Multimedia\",\"volume\":\"27 \",\"pages\":\"4333-4347\"},\"PeriodicalIF\":9.7000,\"publicationDate\":\"2025-03-03\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"IEEE Transactions on Multimedia\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://ieeexplore.ieee.org/document/10908907/\",\"RegionNum\":1,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"COMPUTER SCIENCE, INFORMATION SYSTEMS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Multimedia","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/10908907/","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}

引用次数: 0

摘要

近年来，人们提出了大量基于深度展开网络的图像压缩感知（CS）方法。然而，现有的方法要么使用固定比例的块进行采样，导致对图像内容的了解有限，要么在每次迭代中使用普通卷积神经网络（CNN），削弱了对更广泛的上下文先验的感知。在本文中，我们提出了一种新的图像压缩感知DUN（称为SVASNet），它在单一模型下实现了尺度可变的自适应采样和混合关注变压器重构。具体而言，对于尺度变量采样，首先使用基于采样矩阵的计算器来评估重建失真，只需要测量，而不需要访问地面真实图像。然后，提出一种块尺度聚合（BSA）策略，计算不同尺度块分割下的重构失真，选择最优分割尺度进行采样；为了实现混合注意重构，在梯度下降步骤中建立了双交叉注意子模块，在近端映射步骤中建立了空间注意子模块。CA子模块在梯度下降过程中引入了相间惯性力，提高了相邻迭代之间的记忆效果。SA子模块集成了CNN和Transformer的局部和全局先验表示，并探索密集特征表示之间的局部和全局亲和力。大量的实验结果表明，所提出的SVASNet比最先进的方法取得了显著的改进。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Image Compressive Sensing With Scale-Variable Adaptive Sampling and Hybrid-Attention Transformer Reconstruction

Recently, a large number of image compressive sensing (CS) methods with deep unfolding networks (DUNs) have been proposed. However, existing methods either use fixed-scale blocks for sampling that leads to limited insights into the image content or employ a plain convolutional neural network (CNN) in each iteration that weakens the perception of broader contextual prior. In this paper, we propose a novel DUN (dubbed SVASNet) for image compressive sensing, which achieves scale-variable adaptive sampling and hybrid-attention Transformer reconstruction with a single model. Specifically, for scale-variable sampling, a sampling matrix-based calculator is first employed to evaluate the reconstruction distortion, which only requires measurements without access to the ground truth image. Then, a Block Scale Aggregation (BSA) strategy is presented to compute the reconstruction distortion under block divisions at different scales and select the optimal division scale for sampling. To realize hybrid-attention reconstruction, a dual Cross Attention (CA) submodule in the gradient descent step and a Spatial Attention (SA) submodule in the proximal mapping step are developed. The CA submodule introduces inter-phase inertial forces in the gradient descent, which improves the memory effect between adjacent iterations. The SA submodule integrates local and global prior representations of CNN and Transformer, and explores local and global affinities between dense feature representations. Extensive experimental results show that the proposed SVASNet achieves significant improvements over the state-of-the-art methods.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

IEEE Transactions on Multimedia 工程技术-电信学

CiteScore

11.70

自引率

11.00%

发文量

576

审稿时长

5.5 months

期刊介绍： The IEEE Transactions on Multimedia delves into diverse aspects of multimedia technology and applications, covering circuits, networking, signal processing, systems, software, and systems integration. The scope aligns with the Fields of Interest of the sponsors, ensuring a comprehensive exploration of research in multimedia.