Densely aggregated U-net with spatial-spectral interaction transformer for hyperspectral compressed imaging reconstruction

IF 3.1 4区计算机科学 Q2 COMPUTER SCIENCE, INFORMATION SYSTEMS

Journal of Visual Communication and Image Representation Pub Date : 2026-04-01 Epub Date: 2026-03-27 DOI:10.1016/j.jvcir.2026.104795

Yun-Hui Li

{"title":"Densely aggregated U-net with spatial-spectral interaction transformer for hyperspectral compressed imaging reconstruction","authors":"Yun-Hui Li","doi":"10.1016/j.jvcir.2026.104795","DOIUrl":null,"url":null,"abstract":"<div><div>Hyperspectral imaging offers critical spectral information for applications such as material analysis and camouflage recognition. However, the acquisition of hyperspectral data cubes is inherently constrained by the Nyquist sampling theorem. While compressed sensing theory enables snapshot imaging by compressing the data cube into a 2D measurement, the ill-posed reconstruction remains a significant challenge. Recent deep learning methods, particularly vision transformers, have advanced the state-of-the-art (SOTA). Despite this, existing networks typically employ spectral or spatial self-attentions in isolation, blindly pursuing a global receptive field at the cost of computational efficiency and representational flexibility. Additionally, the vanilla skip connection in U-Nets is insufficient for effective multi-scale information transmission between encoder and decoder. To address these issues, we propose a Densely aggregated U-Net with a Spatial-Spectral Interaction Transformer (DSST). DSST parallelizes patch-based spectral self-attention and window-based spatial self-attention, complemented by an interaction mechanism. Furthermore, it introduces a densely aggregated skip connection to collect multi-scale features and bridge the semantic gap. Experimental results on both simulated and real-world scenes demonstrate that DSST achieves competitive performance with lower computational and memory costs compared to other end-to-end networks. Moreover, it offers faster inference speeds than deep unfolding networks.</div></div>","PeriodicalId":54755,"journal":{"name":"Journal of Visual Communication and Image Representation","volume":"117 ","pages":"Article 104795"},"PeriodicalIF":3.1000,"publicationDate":"2026-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Visual Communication and Image Representation","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S1047320326000908","RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2026/3/27 0:00:00","PubModel":"Epub","JCR":"Q2","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}

引用次数: 0

Abstract

Hyperspectral imaging offers critical spectral information for applications such as material analysis and camouflage recognition. However, the acquisition of hyperspectral data cubes is inherently constrained by the Nyquist sampling theorem. While compressed sensing theory enables snapshot imaging by compressing the data cube into a 2D measurement, the ill-posed reconstruction remains a significant challenge. Recent deep learning methods, particularly vision transformers, have advanced the state-of-the-art (SOTA). Despite this, existing networks typically employ spectral or spatial self-attentions in isolation, blindly pursuing a global receptive field at the cost of computational efficiency and representational flexibility. Additionally, the vanilla skip connection in U-Nets is insufficient for effective multi-scale information transmission between encoder and decoder. To address these issues, we propose a Densely aggregated U-Net with a Spatial-Spectral Interaction Transformer (DSST). DSST parallelizes patch-based spectral self-attention and window-based spatial self-attention, complemented by an interaction mechanism. Furthermore, it introduces a densely aggregated skip connection to collect multi-scale features and bridge the semantic gap. Experimental results on both simulated and real-world scenes demonstrate that DSST achieves competitive performance with lower computational and memory costs compared to other end-to-end networks. Moreover, it offers faster inference speeds than deep unfolding networks.

查看原文本刊更多论文

基于空间-光谱相互作用转换器的密集聚集u网用于高光谱压缩成像重建

高光谱成像为材料分析和伪装识别等应用提供了关键的光谱信息。然而，高光谱数据立方体的获取受到奈奎斯特采样定理的固有约束。虽然压缩感知理论通过将数据立方体压缩为二维测量来实现快照成像，但病态重构仍然是一个重大挑战。最近的深度学习方法，特别是视觉转换器，已经推动了最先进的技术（SOTA）。尽管如此，现有的网络通常孤立地使用频谱或空间自关注，盲目地追求全局接受场，以牺牲计算效率和表示灵活性为代价。此外，U-Nets中传统的跳跳连接不足以在编码器和解码器之间有效地传输多尺度信息。为了解决这些问题，我们提出了一个带有空间-频谱相互作用变压器（DSST）的密集聚合U-Net。DSST将基于斑块的光谱自注意和基于窗口的空间自注意并行化，并辅以交互机制。此外，它还引入了密集聚合的跳跃连接来收集多尺度特征并弥合语义差距。模拟和现实场景的实验结果表明，与其他端到端网络相比，DSST以更低的计算和内存成本实现了具有竞争力的性能。此外，它提供了比深度展开网络更快的推理速度。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Journal of Visual Communication and Image Representation 工程技术-计算机：软件工程

CiteScore

5.40

自引率

11.50%

发文量

188

审稿时长

9.9 months

期刊介绍： The Journal of Visual Communication and Image Representation publishes papers on state-of-the-art visual communication and image representation, with emphasis on novel technologies and theoretical work in this multidisciplinary area of pure and applied research. The field of visual communication and image representation is considered in its broadest sense and covers both digital and analog aspects as well as processing and communication in biological visual systems.