{"title":"高光谱图像去噪的保留式空间-光谱变换","authors":"Haitao Yin, Hao Chen, Jian Zhu","doi":"10.1016/j.infrared.2025.106139","DOIUrl":null,"url":null,"abstract":"<div><div>Retention mechanism has emerged as a promising variant of Transformer and achieved remarkable success in natural language processing and computer vision. However, existing vision retention mechanism only explores spatial prior, suffering from limited representation for cubic spatial–spectral feature of hyperspectral image (HSI). To tackle this issue, we propose a Retentive Spatial–Spectral Transformer (RSST) for HSI denoising, which consists of the Retentive SpAtial Transformer (RSAT) block and the Retentive SpEctral Transformer (RSET) block. To enhance the adaptability of spatial–spectral representation, RSAT and RSET blocks integrate the spatial and spectral priors into the self-attention mechanism, which are formulated as a spatial decay matrix based on two-dimensional Manhattan distance and a spectral decay matrix based on one-dimensional bidirectional distance, respectively. To further improve the representation of 3D local spatial–spectral features, an Irregular Separable 3D Convolution (IrS3DC) module is integrated at the beginning of both the RSAT and RSET blocks. Additionally, RSST is configured as an asymmetric U-Net, in which the encoder and decoder blocks are implemented through the RSAT and RSET blocks, respectively. This asymmetric architecture can decouple spatial–spectral features, yielding high flexibility and low computational cost. Extensive experiments on various HSI datasets demonstrate that RSST outperforms state-of-the-art methods.</div></div>","PeriodicalId":13549,"journal":{"name":"Infrared Physics & Technology","volume":"151 ","pages":"Article 106139"},"PeriodicalIF":3.4000,"publicationDate":"2025-09-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Retentive Spatial–Spectral Transformer for hyperspectral image denoising\",\"authors\":\"Haitao Yin, Hao Chen, Jian Zhu\",\"doi\":\"10.1016/j.infrared.2025.106139\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>Retention mechanism has emerged as a promising variant of Transformer and achieved remarkable success in natural language processing and computer vision. However, existing vision retention mechanism only explores spatial prior, suffering from limited representation for cubic spatial–spectral feature of hyperspectral image (HSI). To tackle this issue, we propose a Retentive Spatial–Spectral Transformer (RSST) for HSI denoising, which consists of the Retentive SpAtial Transformer (RSAT) block and the Retentive SpEctral Transformer (RSET) block. To enhance the adaptability of spatial–spectral representation, RSAT and RSET blocks integrate the spatial and spectral priors into the self-attention mechanism, which are formulated as a spatial decay matrix based on two-dimensional Manhattan distance and a spectral decay matrix based on one-dimensional bidirectional distance, respectively. To further improve the representation of 3D local spatial–spectral features, an Irregular Separable 3D Convolution (IrS3DC) module is integrated at the beginning of both the RSAT and RSET blocks. Additionally, RSST is configured as an asymmetric U-Net, in which the encoder and decoder blocks are implemented through the RSAT and RSET blocks, respectively. This asymmetric architecture can decouple spatial–spectral features, yielding high flexibility and low computational cost. Extensive experiments on various HSI datasets demonstrate that RSST outperforms state-of-the-art methods.</div></div>\",\"PeriodicalId\":13549,\"journal\":{\"name\":\"Infrared Physics & Technology\",\"volume\":\"151 \",\"pages\":\"Article 106139\"},\"PeriodicalIF\":3.4000,\"publicationDate\":\"2025-09-12\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Infrared Physics & Technology\",\"FirstCategoryId\":\"101\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S1350449525004323\",\"RegionNum\":3,\"RegionCategory\":\"物理与天体物理\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"INSTRUMENTS & INSTRUMENTATION\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Infrared Physics & Technology","FirstCategoryId":"101","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S1350449525004323","RegionNum":3,"RegionCategory":"物理与天体物理","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"INSTRUMENTS & INSTRUMENTATION","Score":null,"Total":0}
Retentive Spatial–Spectral Transformer for hyperspectral image denoising
Retention mechanism has emerged as a promising variant of Transformer and achieved remarkable success in natural language processing and computer vision. However, existing vision retention mechanism only explores spatial prior, suffering from limited representation for cubic spatial–spectral feature of hyperspectral image (HSI). To tackle this issue, we propose a Retentive Spatial–Spectral Transformer (RSST) for HSI denoising, which consists of the Retentive SpAtial Transformer (RSAT) block and the Retentive SpEctral Transformer (RSET) block. To enhance the adaptability of spatial–spectral representation, RSAT and RSET blocks integrate the spatial and spectral priors into the self-attention mechanism, which are formulated as a spatial decay matrix based on two-dimensional Manhattan distance and a spectral decay matrix based on one-dimensional bidirectional distance, respectively. To further improve the representation of 3D local spatial–spectral features, an Irregular Separable 3D Convolution (IrS3DC) module is integrated at the beginning of both the RSAT and RSET blocks. Additionally, RSST is configured as an asymmetric U-Net, in which the encoder and decoder blocks are implemented through the RSAT and RSET blocks, respectively. This asymmetric architecture can decouple spatial–spectral features, yielding high flexibility and low computational cost. Extensive experiments on various HSI datasets demonstrate that RSST outperforms state-of-the-art methods.
期刊介绍:
The Journal covers the entire field of infrared physics and technology: theory, experiment, application, devices and instrumentation. Infrared'' is defined as covering the near, mid and far infrared (terahertz) regions from 0.75um (750nm) to 1mm (300GHz.) Submissions in the 300GHz to 100GHz region may be accepted at the editors discretion if their content is relevant to shorter wavelengths. Submissions must be primarily concerned with and directly relevant to this spectral region.
Its core topics can be summarized as the generation, propagation and detection, of infrared radiation; the associated optics, materials and devices; and its use in all fields of science, industry, engineering and medicine.
Infrared techniques occur in many different fields, notably spectroscopy and interferometry; material characterization and processing; atmospheric physics, astronomy and space research. Scientific aspects include lasers, quantum optics, quantum electronics, image processing and semiconductor physics. Some important applications are medical diagnostics and treatment, industrial inspection and environmental monitoring.