基于跨多尺度光谱-空间变换器混合网络的高光谱和多光谱图像融合方法

IF 7.5 1区计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Expert Systems with Applications Pub Date : 2024-11-10 DOI:10.1016/j.eswa.2024.125742

Yingxia Chen , Mingming Wei , Yan Chen

{"title":"基于跨多尺度光谱-空间变换器混合网络的高光谱和多光谱图像融合方法","authors":"Yingxia Chen , Mingming Wei , Yan Chen","doi":"10.1016/j.eswa.2024.125742","DOIUrl":null,"url":null,"abstract":"<div><div>Convolutional neural networks (CNNs) have made a significant contribution to hyperspectral image (HSI) generation. However, capturing long-range dependencies can be challenging with CNNs due to the limitations of their local receptive fields, which can lead to distortions in fused images. Transformers excel at capturing long-range dependencies but have limited capacity for handling fine details. Additionally, priorwork has often overlooked the extraction of global features during the image preprocessing stage, resulting in the potential loss of fine details. To address these issues, we propose a hybrid cross-multiscale spectral-spatial Transformer (HCMSST) that combines the advantages of CNNs in feature extraction and Transformers in capturing long-range dependencies. To fully extract and retain local and global information in the shallow feature extraction phase, the network incorporatesCNNs with a staggered cascade-dense residual block (SCDRB). This block employs staggered residuals to establish direct connections bothwithin and between branches and integrates attention modules to enhance the response to important features. This approach facilitates unrestricted information exchange and fosters deeper feature representations. To address the limitationsof Transformer in processing fine details, we introduce multiscale spatial-spectral coding-decoding structures to obtain comprehensive spatial-spectral features, which are utilized to capture the long-range dependencies via the cross-multiscale spectral-spatial Transformer (CMSST). Further, the CMSST incorporates a cross-level dual-stream feature interaction strategy that integrates spatial and spectral features from different levels and then feeds the fused features back to their corresponding branches for information interaction. Experimental results indicate that the proposed HCMSST achieves superior performance compared to many state-of-the-art (SOTA) methods. Specifically, HCMSST reduces the ERGAS metric by 3.05% compared to the SOTA methods on the CAVE dataset, while on the Harvard dataset, it achieves a 2.69% reduction in ERGAS compared to the SOTA results.</div></div>","PeriodicalId":50461,"journal":{"name":"Expert Systems with Applications","volume":"263 ","pages":"Article 125742"},"PeriodicalIF":7.5000,"publicationDate":"2024-11-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"A method based on hybrid cross-multiscale spectral-spatial transformer network for hyperspectral and multispectral image fusion\",\"authors\":\"Yingxia Chen , Mingming Wei , Yan Chen\",\"doi\":\"10.1016/j.eswa.2024.125742\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>Convolutional neural networks (CNNs) have made a significant contribution to hyperspectral image (HSI) generation. However, capturing long-range dependencies can be challenging with CNNs due to the limitations of their local receptive fields, which can lead to distortions in fused images. Transformers excel at capturing long-range dependencies but have limited capacity for handling fine details. Additionally, priorwork has often overlooked the extraction of global features during the image preprocessing stage, resulting in the potential loss of fine details. To address these issues, we propose a hybrid cross-multiscale spectral-spatial Transformer (HCMSST) that combines the advantages of CNNs in feature extraction and Transformers in capturing long-range dependencies. To fully extract and retain local and global information in the shallow feature extraction phase, the network incorporatesCNNs with a staggered cascade-dense residual block (SCDRB). This block employs staggered residuals to establish direct connections bothwithin and between branches and integrates attention modules to enhance the response to important features. This approach facilitates unrestricted information exchange and fosters deeper feature representations. To address the limitationsof Transformer in processing fine details, we introduce multiscale spatial-spectral coding-decoding structures to obtain comprehensive spatial-spectral features, which are utilized to capture the long-range dependencies via the cross-multiscale spectral-spatial Transformer (CMSST). Further, the CMSST incorporates a cross-level dual-stream feature interaction strategy that integrates spatial and spectral features from different levels and then feeds the fused features back to their corresponding branches for information interaction. Experimental results indicate that the proposed HCMSST achieves superior performance compared to many state-of-the-art (SOTA) methods. Specifically, HCMSST reduces the ERGAS metric by 3.05% compared to the SOTA methods on the CAVE dataset, while on the Harvard dataset, it achieves a 2.69% reduction in ERGAS compared to the SOTA results.</div></div>\",\"PeriodicalId\":50461,\"journal\":{\"name\":\"Expert Systems with Applications\",\"volume\":\"263 \",\"pages\":\"Article 125742\"},\"PeriodicalIF\":7.5000,\"publicationDate\":\"2024-11-10\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Expert Systems with Applications\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S0957417424026095\",\"RegionNum\":1,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Expert Systems with Applications","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0957417424026095","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}

引用次数: 0

摘要

卷积神经网络（CNN）为高光谱图像（HSI）生成做出了重大贡献。然而，由于其局部感受野的限制，使用卷积神经网络捕捉远距离相关性可能具有挑战性，这可能导致融合图像失真。变换器擅长捕捉长距离依赖关系，但处理精细细节的能力有限。此外，之前的工作往往忽略了在图像预处理阶段提取全局特征，从而可能导致精细细节的丢失。为了解决这些问题，我们提出了一种混合跨多尺度光谱空间变换器（HCMSST），它结合了 CNN 在特征提取方面的优势和变换器在捕捉长距离相关性方面的优势。为了在浅层特征提取阶段充分提取并保留局部和全局信息，该网络结合了带有交错级联密集残差块（SCDRB）的 CNN。该块采用交错残差，在分支内部和分支之间建立直接连接，并集成注意力模块，以增强对重要特征的响应。这种方法有利于无限制的信息交换，并促进更深入的特征表征。为了解决变换器在处理精细细节方面的局限性，我们引入了多尺度空间-光谱编码-解码结构，以获得全面的空间-光谱特征，并通过跨多尺度光谱-空间变换器（CMSST）利用这些特征捕捉长程依赖关系。此外，CMSST 还采用了跨级别双流特征交互策略，将来自不同级别的空间和频谱特征整合在一起，然后将融合后的特征反馈给相应的分支机构进行信息交互。实验结果表明，与许多最先进的（SOTA）方法相比，所提出的 HCMSST 实现了更优越的性能。具体来说，在 CAVE 数据集上，与 SOTA 方法相比，HCMSST 的 ERGAS 指标降低了 3.05%；而在哈佛数据集上，与 SOTA 方法相比，HCMSST 的 ERGAS 指标降低了 2.69%。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

A method based on hybrid cross-multiscale spectral-spatial transformer network for hyperspectral and multispectral image fusion

Convolutional neural networks (CNNs) have made a significant contribution to hyperspectral image (HSI) generation. However, capturing long-range dependencies can be challenging with CNNs due to the limitations of their local receptive fields, which can lead to distortions in fused images. Transformers excel at capturing long-range dependencies but have limited capacity for handling fine details. Additionally, prior work has often overlooked the extraction of global features during the image preprocessing stage, resulting in the potential loss of fine details. To address these issues, we propose a hybrid cross-multiscale spectral-spatial Transformer (HCMSST) that combines the advantages of CNNs in feature extraction and Transformers in capturing long-range dependencies. To fully extract and retain local and global information in the shallow feature extraction phase, the network incorporates CNNs with a staggered cascade-dense residual block (SCDRB). This block employs staggered residuals to establish direct connections both within and between branches and integrates attention modules to enhance the response to important features. This approach facilitates unrestricted information exchange and fosters deeper feature representations. To address the limitations of Transformer in processing fine details, we introduce multiscale spatial-spectral coding-decoding structures to obtain comprehensive spatial-spectral features, which are utilized to capture the long-range dependencies via the cross-multiscale spectral-spatial Transformer (CMSST). Further, the CMSST incorporates a cross-level dual-stream feature interaction strategy that integrates spatial and spectral features from different levels and then feeds the fused features back to their corresponding branches for information interaction. Experimental results indicate that the proposed HCMSST achieves superior performance compared to many state-of-the-art (SOTA) methods. Specifically, HCMSST reduces the ERGAS metric by 3.05% compared to the SOTA methods on the CAVE dataset, while on the Harvard dataset, it achieves a 2.69% reduction in ERGAS compared to the SOTA results.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Expert Systems with Applications 工程技术-工程：电子与电气

CiteScore

13.80

自引率

10.60%

发文量

2045

审稿时长

8.7 months

期刊介绍： Expert Systems With Applications is an international journal dedicated to the exchange of information on expert and intelligent systems used globally in industry, government, and universities. The journal emphasizes original papers covering the design, development, testing, implementation, and management of these systems, offering practical guidelines. It spans various sectors such as finance, engineering, marketing, law, project management, information management, medicine, and more. The journal also welcomes papers on multi-agent systems, knowledge management, neural networks, knowledge discovery, data mining, and other related areas, excluding applications to military/defense systems.